I noted in an earlier mega-post on Adobe Experience Manager deployment best practices that one should make up a massive dashboard with your APM and log aggregator toolset with everything that could potentially affect system performance. On this page I’m going to start cataloging some of those searches that one can use in Splunk to visualize and track down issues with AEM.
Searching AEM Request Logs & Grouping the Request with the Response
AEM request logs are always a pain to view sequentially, as AEM logs the request and the response seperately. There’s a fairly simple search using the Splunk transaction command to group the request with its response using the request id that gets logged with each to group them.
sourcetype=aem_request_log | transaction maxpause=10m keepevicted=true requestid maxevents=true
To make this search above work, however, you’ll need a few things configured first:
- Create a sourcetype for the AEM request log: this you’ll likely need admin rights for, to set up your Splunk forwarders to forward crx-quickstart/logs/request.log* into Splunk as a aem_request_log sourcetype.
- Create a field extraction for the Request ID: You’ll then want to use the interactive field extraction wizard in Splunk to extract out the request ID as its own field (which I named requestid above)
An example AEM author/publish request log entry for a request & response is:
01/Nov/2018:11:58:57 -0700  -> GET /content/dam/test/Whitefish%20Mountain%20Biking.mp4/jcr%3acontent/renditions/cq5dam.thumbnail.319.319.png?ch_ck=1541096749000 HTTP/1.1 01/Nov/2018:11:58:57 -0700  <- 304 - 3ms
In the example above, I have the sourcetype set up to extract as:
- “3158” gets extracted as “requestid”
- Further extractions of “/content/dam/test…” as the requestpath, and the numerical “3” in “3ms” extracted as the responsetime. Make sure to extract just the number from the responsetime so that you can do mathmatical functions like averages, most/least/etc against the responsetime.
Graph of Author/Publish Average Response Time
#Splunk Search sourcetype=aem_request_log | timechart avg(responsetime)span=5m by host
When Response time is being extracted from your request logs, use the above to get a chart of average response times by host for the publish tier.
Graphing Dispatcher Cache Hit Ratio
An easy way to get a running graph of your cache hit ratio on the AEM Dispatcher / Publisher tier is to analyze your dispatcher.log to find out how many requests the dispatcher serves out of cache vs the number of requests that it serves after requesting the resource from the backend publisher.
(index=aem source=/var/log/httpd/dispatcher.log host=aem-prd*) 0ms | timechart count as Uncached | appendcols [search(index=aem source=/var/log/httpd/dispatcher.log host=aem-prd*) NOT 0ms | timechart count as Cached]
This is a fairly inefficient search (I’ll get around to improving it) but it gets the job done. Anything with a response time logged as “0ms” in the dispatcher.log was served from cache, whilst anything else would have been served from the publish backend.
Note: the above search will not work at Log Level 3, as there will be too much other noise in the log, so you’ll have to get some more specificity about what constitutes an uncached request.