Setting up AEM Author Workflow Offloading
July 10, 2018An issue with AEM which has persisted since the earliest days of the platform, is that the Authoring environment has never been good at horizontally scaling. By authoring environment, this also extends to all of the other things that the Author AEM instance usually does, like image workflows, PDF rendering, video transcoding, and the like.
One model for attempting to horizontally scale the AEM author is to do Workflow Offloading, where you offload the heavy-duty tasks from the AEM author onto a separate AEM instance which is there only to process workflows and then return the payload back to the primary author instance. This has the purported benefit of being able to take major CPU-intensive and I/O-intensive ops and have them executed by a secondary server which is NOT the one your lag-sensitive authoring users are clicking around on.
However, please be warned – setting up offloading is fraught with pitfalls, and you’ll want to be very-super-extra-sure that you really want to go the offloading route before you try, because usually you’ll just be better-served by beefing up your author box and optimizing your workflows.
Diagrams of AEM Offloading Setup Architectures
Above is the way that Adobe recommends setting up your workflow offloading, based on the offloading best practices document here. Specifically:
- You’ve already split your segmentstore and datastore at AEM installation time
- You are using FileDatastore on an externalized NAS/NFS mount, and are sharing this datastore with the AEM Offload Instance
- You are using binary-less replication to get the workflows and their payloads back and forth between the AEM Author master (leader) instance that your users are logging in to, and the Offload Author(s) that are handling the offloaded workflows.
It is also theoretically possible to do a similar setup using S3Datastore, though such a scenario isn’t explicitly documented by Adobe. (EDIT: after working with Adobe support, a working model was demonstrated on test gear, the steps of such a setup are documented below.)
This setup would look like the following:
Theoretically, offloading also works using an S3 Datastore. When one has an AEM Assets environment spanning multiple (or tens) of terabytes, it’s obviously advantageous to be able to use S3’s low-cost storage to store AEM assets once, rather than multiplying this storage across a shared-nothing publish environment, all on higher-cost EBS storage.
However, even getting workflow offloading to work at all using S3 is an undocumented and yet extremely effective source of pain and agony. One of the reasons for this is that when an item is uploaded to S3, it’s first written to a local S3 cache while an async call is sent off to persist the binary to S3. However, there’s no flag in the workflow to wait until the item is persisted out to S3 before kicking off the offload which then attempts binary-less replication of the workflow to the offload author which would then try to access the binary that perhaps is not available yet. Race condition excellence.
Steps to Configure Workflow Offloading on Authors with binary-less replication & a Shared Amazon S3 Datastore
After working with Adobe support extensively on this issue, the following basic instructions were used to get AEM author workflow offloading working on test instances which were connected based on the diagram above, using a shared S3 datastore.
Configure Master and Worker instances with S3 Datastore
- Create folders for two AEM instances: master, worker
- Configure the S3 datastore as per this Adobe documentation: https://helpx.adobe.com/experience-manager/6-3/sites/deploying/using/data-store-config.html
- Start master AEM instance. Verify that it’s ready and accessible.
- Start AEM worker. Verify that it’s ready and accessible.
Administering Topology
- Login into OSGi on the Master instance: http://localhost:4502/system/console/topology
- Note the instance Sling ID and confirm that master AEM is shown as local. Note, that for master and worker reside on two different servers it requires to configure Configure Discovery.Oak Service at http:// <host>:<port>/system/console/configMgr/org.apache.sling.discovery.oak.Config
- Verify that worker instance is shown under Connectors:
- Login into OSGi on worker: http://localhost:4504/system/console/topology
- Note the instance Sling ID and confirm that Worker AEM is shown as local. Note, that for master and worker reside on two different servers it requires to configure Configure Discovery.Oak Service at http:// <host>:<port>/system/console/configMgr/org.apache.sling.discovery.oak.Config
- Verify that worker Outgoing topology connectors is pointed to the master AEM:
Verify that worker Outgoing topology connectors is pointed to the master AEM
Configure Topic Consumption
- On the master AEM, switch to Offloading Browser at: http://localhost:4502/libs/granite/offloading/content/view.html
- Locate Topic: com/adobe/granite/workflow/offloading
- Disable Topic for the master instance:
Turning off automatic agent management
Adobe recommends that you turn off automatic agent management because it does not support binary-less replication and can cause confusion when setting up a new offloading topology. Moreover, it does not automatically support the forward replication flow required by binary-less replication.
- Open Configuration Manager from the URL http://localhost:4502/system/console/configMgr.
- Open the configuration for OffloadingAgentManager (http://localhost:4502/system/console/configMgr/com.adobe.granite.offloading.impl.transporter.OffloadingAgentManager).
- Disable automatic agent management.
Repeat same steps for the worker instance.
Offloading replication agents
- Open replication agents in miscadmin console: http://localhost:4502/miscadmin#/etc/replication/agents.author
- Delete all 3 agents named “offloading_replication_agent” “offloading_outbox” and “offloading_reverse_*” on Master
- On the Master, create a new replication agent with the Title and Name as “offloading_*” where * is the sling ID of the Worker:
- Edit the properties of this replication agent as follows:
Property | Value |
Settings > Serialization Type | Binary less |
Transport >Transport URI | http://<ip of worker instance>:<port>/bin/receive?sling:authRequestLogin=1&binaryless=true |
Transport >Transport User | Replication user on target instance |
Transport >Transport Passoword | Replication user password on target instance |
Extended > HTTP Method | POST |
Triggers > Ignore Default | True |
Repeat the same steps on the worker, with the following changes:
- The Name and Title is “offloading_*” with * being the Sling ID of the Master
- The Transport URI needs to point to the Master
Offloading the Processing of DAM Assets
- On the master AEM get to Workflow console and switch to launchers tab: http://localhost:4502/libs/cq/workflow/content/console.html
- There are four launchers for DAM Update Asset workflow.
- For each launcher changes workflow dropdown value from “DAM Update Asset” to “DAM Update Asset Offloading”:
- On the worker, open DAM Update Assets workflow and uncheck Transient Workflow, then save the changes.
- Open Workflow launcher console on the worker.
- Disable all DAM Update Assets workflow launchers:
Using forward replication
- On Worker, open configuration for OffloadingDefaultTransporter (http://localhost:4502/system/console/configMgr/com.adobe.granite.offloading.impl.transporter.OffloadingDefaultTransporter).
- Change value of the property default.transport.agent-to-master.prefix from offloading_outbox to offloading.
Turning off transport packages
- On master, open the component configuration of OffloadingDefaultTransporter component at http://localhost:4502/system/console/configMgr/com.adobe.granite.offloading.impl.transporter.OffloadingDefaultTransporter
- Disable the property Replication Package (default.transport.contentpackage).
- Repeat the same step on worker:
Disabling the transport of workflow model
- On master, open the workflow console from http://localhost:4502/libs/cq/workflow/content/console.html.
- Open the Models tab.
- Open the DAM Update Asset Offloading workflow model.
- Open step properties for the DAM Workflow Offloading step.
- Open the Arguments tab, and unselect the Add Model To Input and Add Model To Output options:
- Save the changes to the model.
Optimizing the polling interval
Workflow offloading is implemented using an external workflow on the master, that polls for the completion of the offloaded workflow on the worker. The default polling interval for the external workflow processes is five seconds. Adobe recommends that you increase the polling interval of the Assets offloading step to at least 15 seconds to reduce the offloading overhead on the master.
- Open the workflow console from http://localhost:4502/libs/cq/workflow/content/console.html.
- Open the Models tab.
- Open the DAM Update Asset Offloading workflow model.
- Open the step properties for the DAM Workflow Offloading step.
- Open the Commons tab, and adjust the value of the Period property.
- Save the changes to the model.
Testing
- Upload an image to Asset on master.
- Verify that the same image appears in Asset on Worker.
- Check worker workflow console-> Archive tab and note DAM Update Workflow
- Check master workflow console-> Archive tab and note DAM Update Asset Offloading instance.
- Check the error.log on the Worker and ensure the processing is executed here:
…
17.08.2018 23:12:39.349 *INFO* [JobHandler: /etc/workflow/instances/server0/2018-08-17/update_asset_5:/content/dam/myfolder/BinaryLogs.jpg/jcr:content/renditions/original] com.adobe.xmp.worker.files.ncomm.XMPFilesNComm [PERF][EXECUTE_START] | C:\Users\ela\AppData\Local\Temp\cq-dam-wf-file8971817675171803421.tmp | XMP extraction
…
- Check the error.log on the Master and ensure the renditions are retrieved from Worker:
…
17.08.2018 23:12:37.534 *INFO* [JobHandler: /etc/workflow/instances/server0/2018-08-17/dam-xmp-writeback_6:/content/dam/myfolder/BinaryLogs.jpg/jcr:content/metadata] com.day.cq.dam.core.process.XMPWritebackProcess payload path :/content/dam/myfolder/BinaryLogs.jpg/jcr:content/metadata
Adobe Documentation on Workflow Offloading
- AEM Assets Offloading Best Practices: https://helpx.adobe.com/experience-manager/6-4/assets/using/assets-offloading-best-practices.html
- Overview on Offloading Jobs in AEM: https://helpx.adobe.com/experience-manager/6-4/sites/deploying/using/offloading.html
- Creating & Consuming jobs for Offloading: https://helpx.adobe.com/experience-manager/6-4/sites/developing/using/dev-offloading.html