Is Log Analysis for Web Analytics a Dead Subject?
April 1, 2009I’ve been involved in web analytics in one capacity or another for a good while. Back in 1998, when I was first getting started in my career & working for Webtrends, there were only two ways to go about getting stats from your website: either get a program to crunch the logs for your site (of which there were many), or pay some ridiculous sum for a tool like Aria or NetGenesis or HitList which used packet-sniffers placed in front of your web server, to track various interactions with your site.
In either case, the entire subject of web analytics assumed that every interaction that your users were doing with your site was going to result in another page request back to the server, which you could then track via a log file.
Wow, how times have changed. You can complete an hour-long stay at many websites, and still have only looked at one HTML page. Doesn’t make log-based analytics too entertaining, especially when your videos are hosted off-site.
In late 2004, I did a project for for a company in Atlanta, testing out literally about 30 different web analytics solutions to work out the best one for them. At that point in time, out all the different packages that I reviewed, there was a pretty even split between the number of web analytics companies that were making a go at it with a shrink-wrap, software-based solution that would be hosted at the client side, and the other half were ASP’s.
Many organizations that I was working with were not all too interested in turning to an ASP-based setup for various security reasons, as well as the fact that ASP’s aren’t too great at doing analytics for intranet sites, when the client can’t access the external internet.
At that point in time, the landscape looked something like this:
Analytics Product | ASP or Software | Analysis Type |
Webtrends | Either | Log Analysis or Page-tagging |
Datanautics G2 | Software | Log Analysis & Packet Sniffer |
DeepMetrix LiveStats xSP | Software | Log Analysis |
Pilot HitList | Software | Log Analysis & Packet Sniffer |
Sane NetTracker | Software | Log Analysis or Page-tagging |
Sawmill | Software | Log Analysis |
SPSS NetGenesis | Software | Log Analysis, Page-tagging and Packet Sniffer |
Urchin | Software or ASP | Log analysis or Page-tagging |
Eloqua | ASP | Page-tagging |
Elytics EAS | ASP | Page-tagging |
Manticore Virtual Touchstone | ASP | Page-tagging |
Omniture SiteCatalyst | ASP | Page-tagging, but they’d import your old logs for a fee |
SageMetrics SageAnalyst | ASP | log analysis and page-tagging |
WebSideStory HBX | ASP | Page-tagging, but they’d import your old logs for a fee |
As you can see, it was about half-and-half, with most products still clinging to log analysis, but many more progressive (and sometimes completely frightening) products going to page-tagging exclusively to be able to trap and coordinate interactions with your sites.
But now, boy has the landscape changed. After Google bought Urchin and transformed it into a free product, countless companies have now successfully experimented with Google Analytics and found it (and with it, the whole premise of tagging pages) to be a reliable and insightful way of tracking interactions with pages.
Also, the fact of its being free has forced a lot of companies to either (a) drop the web analytics business alltogether, or (b) dramatically change their model so as to differentiate themselves from Google Analytics and move themselves way upmarket.
Check out how this grid looks now, 5 years later:
Analytics Product | Still around? If so, what type of product | Analysis Type |
Webtrends | ASP or Software | Log Analysis or Page-tagging |
Datanautics G2 | Product is discontinued | |
DeepMetrix LiveStats xSP | Bought by Microsoft, and then deep-sixed | |
Pilot HitList | Product bought by SAP and then deep-sixed | |
Sane NetTracker | Acquired by Unica, now merged in with their Net Insight software product. | Log Analysis or Page-tagging |
Sawmill | Still Software | Log Analysis |
SPSS NetGenesis | Toasted by SPSS? Product page is now a 404 | |
Urchin | Bought by Google, is now Google Analytics | Page Tagging |
Eloqua | ASP | Page-tagging |
Elytics EAS | Dead like a doornail | |
Manticore Virtual Touchstone | ASP | Page-tagging |
Omniture SiteCatalyst | ASP | Page-tagging, but they’d import your old logs for a fee |
SageMetrics SageAnalyst | ASP | log analysis and page-tagging |
WebSideStory HBX | Bought by Omniture, old HBX product is toast |
In any case, it’s a subject for another blog post as to what companies have to do to differentiate themselves from Google Analytics in order to make it worth the cash for users to upgrade from a free product.
The main case here, though, is if log analysis has any value or relevance still in the market? What do you get from a log analysis tool these days that you can’t get from a pixel tracker?
The Web Server is always going to produce a log file, so there will always be a place for log file analysers, this is as true of the Web as it is for any other maket – Firewalls, Mail servers and much of the new breed of security devices.
One problem you get with Google et al. is that there is no history, and the moment you stop there is no future. Sure the analysis is good, but you are stuck with it once you choose it. Log Analysis and the logs are alwyas there if things change – any why not run both?
Sawmill also now has a JavaScript page tagging option, you can run it in house so you can do your Intranet too) or host it, so Sawmill does both (and you can still run Google too!).
The link to Samwill is here: http://www.sawmill.co.uk
Graham
Graham – I definitely agree, and there is always going to be a place for analysis of a log file of some manner. I’m just of the opinion, right now, that whilst the former purpose of log analysis was for marketing & behaviour analysis, at this point the main driving force I see now is IT.
As you can tell from some of my newer posts, I’m becoming a fan of Splunk – which all of the sudden has made so many of my logs much more relevant again.
Hi JETTEROHELLER,
If we analyze log files coming from a webpage, I think it’s ok if we use an ASP solution or Google Analytics ( in this case we know the historical information will be lost someday).
But in case we want to measure the audience of a video (mp4, flv, or mp3 if audio) played directly trough a player: Windows Media Player, QuickTime Player, VLC, or a Flash Player (not tagged by you) the log analysis is the only way.
Do somebody know a good product focused in the Video and Audio analytics based in log files?
I’ve been able to get such reports (rudimentary ones) from AWStats. Better ones from Sawmill, and quite decent ones from WebTrends — which is expensive, but amazing.
The problem with tagging links to PDFs or other non-HTML files is that it doesn’t take into account any direct traffic from search engines or other pages. If the javascript can’t execute, what can you do?
I think that’s one of the main areas where you do indeed still have to monitor your logs. But for this type of report, where you can’t really integrate it well into your Webtrends/Google Analytics/Omniture reports easily, you just get the data with a tool like Splunk. Over the last year, I’ve become quite a Splunk addict, as it makes questions like that easy to ask – especially ones that are ad-hoc and ones you weren’t expecting management to ask you for.
As you state, obviously a PDF reader is not going to execute Javascript – so indeed the only way to do this is by analyzing logs.
I know very little about Splunk.
We run a Windows server cluster and had started to move from WebTrends to GA. Tracking external traffic to our PDFs is a big problem that we did not anticipate. Now we are evaluating – do we go back to WT?
Splunk is pretty rad. Free as long as you’re ingesting less than 500MB/day worth of logs into Splunk, and absurdly powerful query / charting / reporting capability. I’d try downloading it & pointing it at the logs that contain your external PDF hits. Especially if you do some pre-processing of the logs to grep -v out data that you don’t need, I’m sure you won’t hit over 500MB/day. And Splunk lets you do any sort of query you want in real-time, instead of the WebTrends way which was relying only on canned reports, and less on ad-hoc queries.
Do you know if Splunk can handle multiple logs created by load balancing systems?
It depends on what’s in the logs. If you’re trying to stitch together sessions that are appearing on multiple load-balanced servers, it’s easiest if you’re logging cookie or other unique identifier that Splunk can use to then establish that the hits came from a single session. Otherwise, you could tie it together with IP & User Agent, but that’s not as reliable. But either way, yes – you can handle multiple logs on load-balanced servers. That’s what I’m currently doing now at my work, we’ve got 4 active apache servers serving the site at any one time, and Splunk ingests all their logs simultaneously & can report on them easily.