POINT OF view
Adobe Social Collaboration: A Deep Dive Into Performance and Scalability Sruthisagar Kasturirangan, Infrastructure Architect, Infrastructure Practice, SapientNitro, Bangalore
INTRODUCTION Adobe’s Social Collaboration unifies all social networking and collaboration applications within AEM (Adobe Experience Manager) and has gained a lot of attention—in part because today’s consumers are increasingly active on various mobile devices and placing a lot of value on feedback from fellow buyers. And smart content and commerce platforms are capitalizing on Social Collaboration to boost sales and give the end user the best experience possible. In order to understand Adobe’s Social Collaboration better, we dove into a complete analysis of its performance and scalability aspects. We accomplished this by performing tests with Adobe’s provided JMeter scripting framework for running the benchmark tests you’ll see below. The tests include scripts that perform pure write operations so that it’s possible to measure the overall throughput that can be supported in order to eventually arrive at a physical architecture sizing and capacity plan. Through these tests, we are now able to provide a general guidance on the methodology needed in order to size the infrastructure and identify key bottlenecks when integrating Social Collaboration as part of the overall design of a content and collaboration platform. This paper has been written not to contend the results provided by Adobe Systems Incorporated in their documentation but to extend the results for virtualized environments due to the influx in development in the arena of cloud hosting. The following results have been elaborately analyzed and discussed before arriving at the conclusions you’re about to read.
Experimental Setup First, let’s briefly go through the experimental setup we used to conduct those benchmark tests, including the AEM version used, the system configuration, the benchmark architecture, and the test scenario.
© Sapient Corporation, 2013
POINT OF view
AEM Version AEM 5.6.0
System Configuration
Author & Publish Environments: 8 – CPUs Currently (Logical CPUs) 8 – CPUs Configured Number of Processors: 2 (Allocated) PowerPC_POWER7 – Processor 64 bit – Hardware 7.1.2.1 TL02 – AIX Kernel Version Memory Size: 8192MB Total Paging Space: 2048MB JVM Settings: Maximum Heap Size: 4GB; PermGen: 512MB; IBM J9VM 1.6, GENCON Algorithm
Benchmark Architecture
SINGLE PUBLISH CONFIGURATION REVERSE REPLICATION
AUTHOR NODE
USER REQUESTS
PUBLISH NODE
Test Scenario
The tests below were all performed using Adobe’s out-of-the-box application Geometrixx. Adobe’s benchmark scripts have procedures to create multiple users in the author and publish environments so that a realistic test scenario can be created. In this case, a test forum topic was created with a small description. The user was then pre-authenticated during the warm up and, once authenticated, held the session and performed continuous write operations.
Iterations The various iterations of testing are tabulated and the details of the load model and results are described in the following sections. In particular, the result sections are focused on analyzing the transactions per second as a function of the total number of transactions and average response times (i.e., time taken for last byte).
Load Model
#Generic properties: threads/users. #All timings are in seconds. #startThreadCount is the total number of concurrent threads/users. (For 5 requests per second, set it to 150.) #startupDelay is the ramp-up time for starting threads. (For 150 threads, set it to 60 seconds.) #holdLoadFor is the time the test is run. (For 10 minutes, set it to 600.) #shutdownTime is the time it takes the threads to shut down. (Set it to the same value as startupDelay.) #requestsPerSec is the number of requests per number of seconds.
© Sapient Corporation, 2013
POINT OF view
Iteration 1
startThreadCount (the total number of concurrent users/threads)=150 startupDelay=60 holdLoadFor=1200 shutdownTime=0 requestsPerSec=2 RPSduration=30 Load Ramp Up Model Expected parallel users count
http://apc.kg/plugins
200
Number of active threads
180 160 140 120 100 80 60 40 20 0 00:00:00
00:02:06
00:04:12
00:06:18
00:08:24
00:10:30
00:12:36
00:14:42
00:16:48
00:18:54
00:21:00
Elapsed Time
Throughput Throttling Expected RPS
http://apc.kg/plugins
10
Number of requests/sec
9 8 7 6 5 4 3 2 1 0 00:00:00
00:00:03
00:00:06
00:00:09
00:00:12
00:00:15
00:00:18
00:00:21
00:00:24
00:00:27
00:00:30
Elapsed Time
Note: This test was run with Ultimate Thread Group by throttling requests per second to 2.
Results TPS 1.72 1.7 1.68 1.66
TPS
1.64 1.62 1.6 1.58 1.56 1.54 470
755
1044
1341
1644
1940
2234
Transactions
Š Sapient Corporation, 2013
POINT OF view
AVG_RESPONSE_TIME 3000 2500 2000 AVG_RESPONSE_TIME
1500 1000 500 0 755
470
1044
1341
1644
1940
2234
Transactions
Response Times vs. Elapsed Time 30 000 27 000
Response times in ms
24 000 21 000 18 000
add Topic to Publish Node
15 000
get Topic Page
12 000
setTotalTime
9 000 6 000 3 000 0 00:00:00
00:04:05
00:08:11
00:12:17
00:16:23
00:20:28
00:24:34
00:28:40
00:32:46
00:36:51
00:40:57
http://apc.kg/plugins
Elapsed Time (granularity: 100 ms)
From the graphs above, it is clear that only when the load is throttled in such a way as to limit the TPS (transactions per second) to be around 2 are we able to achieve response times within an acceptable range. Throttling is performed using a JMeter Plugin (Ultimate Thread Group) but this does not indicate the concurrent user sessions. Therefore, additional testing is required to understand the behaviors associated with these changing user patterns.
Iteration 2
startThreadCount (the total number of concurrent users/threads)=150 startupDelay=1200 holdLoadFor=1200 shutdownTime=0 Load Ramp Up Model Expected parallel users count
http://apc.kg/plugins
200
Number of active threads
180 160 140 120 100 80 60 40 20 0 00:00:00
00:04:00
00:08:00
00:12:00
00:16:00
00:20:00 Elapsed Time
00:24:00
00:28:00
00:32:00
00:36:00
00:40:00
Note: This test was run without Ultimate Thread Group and no throttling was applied
Š Sapient Corporation, 2013
POINT OF view
Results TPS 4.5 4 3.5 3
TPS
2.5 2 1.5 1 0.5 0 482
1223
1971
2748
3454
4192
4953
5736
6432
7166
7935
8734
9500
9882
Transactions
AVG_RESPONSE_TIME 30000 25000 20000 15000
AVG_RESPONSE_TIME
10000 5000 0 482
1223
1971
2748
3454
4192
4953
5736
6432
7166
7935
8734
9500
9882
Transactions
Response Times vs. Elapsed Time 200 000 180 000 160 000
Response times in ms
140 000 120 000
add Topic to Publish Node
100 000
get Topic Page
80 000
setTotalTime
60 000 40 000 20 000 0 00:00:00
00:04:03
00:08:06
00:12:09
00:16:12
00:20:15
00:24:18
00:28:21
00:32:24
00:36:27
00:40:30
http://apc.kg/plugins
Elapsed Time (granularity: 500 ms)
From the graphs above, we can see that the load was not throttled and users were ramped up at the rate of 1 user every 8 seconds. The moment all 150 users were ramped up, the response times grew to a level that were not within acceptable limits for the page performance.
Š Sapient Corporation, 2013
POINT OF view
Iteration 3
startThreadCount (the total number of concurrent users/threads)=10 startupDelay=100 holdLoadFor=600 shutdownTime=0
Load Ramp Up Model Expected parallel users count
http://apc.kg/plugins
10
Number of active threads
9 8 7 6 5 4 3 2 1 0 00:00:00
00:01:10
00:02:20
00:03:30
00:04:40
00:05:50
00:07:00
00:08:10
00:09:20
00:10:30
00:11:40
Elapsed Time
Note: This test was run without Ultimate Thread Group and no throttling was applied.
Results TPS 2.55 2.5 2.45 2.4 2.35
TPS
2.3 2.25 2.2 2.15 2.1 2.05 459
946
1429
1774
Transactions
AVG_RESPONSE_TIME 3700 3600 3500 3400 AVG_RESPONSE_TIME 3300 3200 3100 3000 459
946
1429
1774
Transactions
Š Sapient Corporation, 2013
POINT OF view
Response Times vs. Elapsed Time 10 000 9 000 8 000
Response times in ms
7 000 6 000
add Topic to Publish Node
5 000
get Topic Page
4 000
setTotalTime
3 000 2 000 1000 0 00:00:00
00:01:10
00:02:21
00:03:31
00:04:42
00:05:53
00:07:03
00:08:14
00:09:25
00:10:35
http://apc.kg/plugins
00:11:46
Elapsed Time (granularity: 500 ms)
From the graphs above, we can see that, since the load was not throttled and users were ramped up at the rate of 1 user every 10 seconds, the moment all 10 users were ramped up, the response times grew to a level that were not within acceptable limits for the page performance. In this scenario, it did not make any sense to go below 10 concurrent users. And since the average response times were in the order of 3.5 seconds, it was concluded that a single publish server would be able to support less than 10 concurrent users.
Overall System Utilization Publish 100 90 80 70
CPU Total hdadhdcom03 19-7-2013
60 50 40 30
User%
Sys%
Wait%
20 10 05:30
05:10
05:20
05:00
04:40
04:50
04:30
04:10
04:20
04:00
03:40
03:50
03:30
03:10
03:20
03:00
02:40
02:50
02:30
02:10
02:20
02:00
01:40
01:50
01:30
01:10
01:20
01:00
00:40
00:50
00:20
00:30
00:10
00:00
0
Author 100 90 80 70 60
CPU Total hdadhdcom01 19-7-2013
50 40 30
User%
Sys%
Wait%
20 10
05:40
05:30
05:10
05:20
05:00
04:40
04:50
04:20
04:30
04:10
04:00
03:40
03:50
03:30
03:10
03:20
03:00
02:50
02:40
02:30
02:20
02:10
02:00
01:40
01:50
01:30
01:20
01:10
01:00
00:40
00:50
00:20
00:30
00:10
00:00
0
Š Sapient Corporation, 2013
POINT OF view
CONCLUSION After conducting this series of tests, and then discussing and analyzing them, we’ve arrived at a few key takeaways that we think are worthwhile to consider: 1.
For a total achievable throughput, a single publish and a single author are able to achieve 1.6 TPS within an acceptable response time (those response times below 2 seconds). 2. For a total achievable concurrent user/thread count, a single publish instance is able to handle less than 10 concurrent threads/users performing continuous read operations and updates to maintain response times within SLAs (service-level agreements). 3. Scaling publish servers horizontally, in order to handle higher volumes of updates, is of no value since the bottleneck would lead to reverse replication to the author instance. (Throughput indicated above is for the entire publish layer and not for a single publish layer.) Adobe’s Social Collaboration can help to achieve social media goals and improve strategy, performance, and scalability. It is our hope that this paper has answered some of your questions and helped you better understand this particular social solution.
References 1.
Q Planning and Capacity Guide C http://dev.day.com/docs/en/cq/current/managing/capacity-guide.html 2. CQ Hardware Sizing Guidelines http://wem.help.adobe.com/enterprise/en_US/10-0/wem/managing/hardware_sizing_ guidelines.html 3. Introduction to Adobe’s Social Communities http://dev.day.com/docs/en/cq/current/administering/social_communities.html
ABOUT THE AUTHOR Sruthisagar Kasturirangan is an Infrastructure Architect, Infrastructure Practice, at SapientNitro Bangalore. A graduate from Iowa State University, he moved on to gain extensive experience within leading IT organizations and eventually moved back to his home country to join Sapient Corporation. He has over 11 years of experience in systems administration of Unix Platforms and Application Servers such as WebSphere and Weblogic, and intense exposure on capacity planning and performance tuning of Java Applications.
© Sapient Corporation, 2013