Skip to content

Is Log Analysis for Web Analytics a Dead Subject?


I’ve been involved in web analytics in one capacity or another for about 11 years.  Back in 1998, when I was first getting started on this when working for Webtrends, there were only two ways to go about getting stats from your website:  either get a program to crunch the logs for your site (of which there were many), or pay some ridiculous sum for a tool like Aria or NetGenesis or HitList which used packet-sniffers placed in front of your web server, to track various interactions with your site.

In either case, the entire subject of web analytics assumed that every interaction that your users were doing with your site was going to result in another page request back to the server, which you could then track via a log file.

image Wow, how times have changed.  Try going to a site like the main video channel for the Church of Scientology, or this one for the Volunteer Ministers.  You can complete an hour-long stay at either website, and still have only looked at one HTML page.  Doesn’t make log-based analytics too entertaining, especially when your videos are hosted off-site.

In late 2004, I did a project for for a company in Atlanta, testing out literally about 30 different web analytics solutions to work out the best one for them.  At that point in time, out all the different packages that I reviewed, there was a pretty even split between the number of web analytics companies that were making a go at it with a shrink-wrap, software-based solution that would be hosted at the client side, and the other half were ASP’s.

Many organizations that I was working with were not all too interested in turning to an ASP-based setup for various security reasons, as well as the fact that ASP’s aren’t too great at doing analytics for  intranet sites, when the client can’t access the external internet.

At that point in time, the landscape looked something like this:

Analytics Product ASP or Software Analysis Type
Webtrends Either Log Analysis or Page-tagging
Datanautics G2 Software Log Analysis & Packet Sniffer
DeepMetrix LiveStats xSP Software Log Analysis
Pilot HitList Software Log Analysis & Packet Sniffer
Sane NetTracker Software Log Analysis or Page-tagging
Sawmill Software Log Analysis
SPSS NetGenesis Software Log Analysis, Page-tagging and Packet Sniffer
Urchin Software or ASP Log analysis or Page-tagging
Eloqua ASP Page-tagging
Elytics EAS ASP Page-tagging
Manticore Virtual Touchstone ASP Page-tagging
Omniture SiteCatalyst ASP Page-tagging, but they’d import your old logs for a fee
SageMetrics SageAnalyst ASP log analysis and page-tagging
WebSideStory HBX ASP Page-tagging, but they’d import your old logs for a fee


As you can see, it was about half-and-half, with most products still clinging to log analysis, but many more progressive (and sometimes completely frightening) products going to page-tagging exclusively to be able to trap and coordinate interactions with your sites.

But now, boy has the landscape changed.   After Google bought Urchin and transformed it into a free product, countless companies have now successfully experimented with Google Analytics and found it (and with it, the whole premise of tagging pages) to be a reliable and insightful way of tracking interactions with pages.

Also, the fact of its being free has forced a lot of companies to either (a) drop the web analytics business alltogether, or (b) dramatically change their model so as to differentiate themselves from Google Analytics and move themselves way upmarket.

Check out how this grid looks now, 5 years later:

Analytics Product Still around?  If so, what type of product Analysis Type
Webtrends ASP or Software Log Analysis or Page-tagging
Datanautics G2 Product is discontinued
DeepMetrix LiveStats xSP Bought by Microsoft, and then deep-sixed
Pilot HitList Product bought by SAP and then deep-sixed
Sane NetTracker Acquired by Unica, now merged in with their Net Insight software product. Log Analysis or Page-tagging
Sawmill Still Software Log Analysis
SPSS NetGenesis Toasted by SPSS?  Product page is now a 404
Urchin Bought by Google, is now Google Analytics Page Tagging
Eloqua ASP Page-tagging
Elytics EAS Dead like a doornail
Manticore Virtual Touchstone ASP Page-tagging
Omniture SiteCatalyst ASP Page-tagging, but they’d import your old logs for a fee
SageMetrics SageAnalyst ASP log analysis and page-tagging
WebSideStory HBX Bought by Omniture, old HBX product is toast

In any case, it’s a subject for another blog post as to what companies have to do to differentiate themselves from Google Analytics in order to make it worth the cash for users to upgrade from a free product.

The main case here, though, is if log analysis has any value or relevance still in the market?   What do you get from a log analysis tool these days that you can’t get from a pixel tracker?

10 thoughts on “Is Log Analysis for Web Analytics a Dead Subject?”

  1. The Web Server is always going to produce a log file, so there will always be a place for log file analysers, this is as true of the Web as it is for any other maket – Firewalls, Mail servers and much of the new breed of security devices.

    One problem you get with Google et al. is that there is no history, and the moment you stop there is no future. Sure the analysis is good, but you are stuck with it once you choose it. Log Analysis and the logs are alwyas there if things change – any why not run both?

    Sawmill also now has a JavaScript page tagging option, you can run it in house so you can do your Intranet too) or host it, so Sawmill does both (and you can still run Google too!).

    The link to Samwill is here:


  2. Graham – I definitely agree, and there is always going to be a place for analysis of a log file of some manner. I’m just of the opinion, right now, that whilst the former purpose of log analysis was for marketing & behaviour analysis, at this point the main driving force I see now is IT.

    As you can tell from some of my newer posts, I’m becoming a fan of Splunk – which all of the sudden has made so many of my logs much more relevant again.


    If we analyze log files coming from a webpage, I think it’s ok if we use an ASP solution or Google Analytics ( in this case we know the historical information will be lost someday).
    But in case we want to measure the audience of a video (mp4, flv, or mp3 if audio) played directly trough a player: Windows Media Player, QuickTime Player, VLC, or a Flash Player (not tagged by you) the log analysis is the only way.
    Do somebody know a good product focused in the Video and Audio analytics based in log files?

    1. I’ve been able to get such reports (rudimentary ones) from AWStats. Better ones from Sawmill, and quite decent ones from WebTrends — which is expensive, but amazing.

  4. The problem with tagging links to PDFs or other non-HTML files is that it doesn’t take into account any direct traffic from search engines or other pages. If the javascript can’t execute, what can you do?

    1. I think that’s one of the main areas where you do indeed still have to monitor your logs. But for this type of report, where you can’t really integrate it well into your Webtrends/Google Analytics/Omniture reports easily, you just get the data with a tool like Splunk. Over the last year, I’ve become quite a Splunk addict, as it makes questions like that easy to ask – especially ones that are ad-hoc and ones you weren’t expecting management to ask you for.

      As you state, obviously a PDF reader is not going to execute Javascript – so indeed the only way to do this is by analyzing logs.

      1. I know very little about Splunk.

        We run a Windows server cluster and had started to move from WebTrends to GA. Tracking external traffic to our PDFs is a big problem that we did not anticipate. Now we are evaluating – do we go back to WT?

        1. Splunk is pretty rad. Free as long as you’re ingesting less than 500MB/day worth of logs into Splunk, and absurdly powerful query / charting / reporting capability. I’d try downloading it & pointing it at the logs that contain your external PDF hits. Especially if you do some pre-processing of the logs to grep -v out data that you don’t need, I’m sure you won’t hit over 500MB/day. And Splunk lets you do any sort of query you want in real-time, instead of the WebTrends way which was relying only on canned reports, and less on ad-hoc queries.

            1. It depends on what’s in the logs. If you’re trying to stitch together sessions that are appearing on multiple load-balanced servers, it’s easiest if you’re logging cookie or other unique identifier that Splunk can use to then establish that the hits came from a single session. Otherwise, you could tie it together with IP & User Agent, but that’s not as reliable. But either way, yes – you can handle multiple logs on load-balanced servers. That’s what I’m currently doing now at my work, we’ve got 4 active apache servers serving the site at any one time, and Splunk ingests all their logs simultaneously & can report on them easily.

    Leave a Reply

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    %d bloggers like this: