Let’s imagine you have a startup up and running and everything is going smoothly. However, if you have an upcoming update, a promotion or any other event and you want to be sure that your servers will handle additional load.
Today we will walk you through a real world example of how we handled the situation in one of the startup projects we were working on.
Infrastructure based on 3 Windows Amazon S2 servers ready to handle 200K users:
– Web API that runs on IIS;
– 6 separate windows services;
– 7 separate MSSQL databases;
Expected Web API load:
– 2M users (500K new registrations in 2 days);
– 5% active users, 95% running application in background;
~ 3 login req/sec;
~ 2-2.5K total req/sec;
– limited budget;
– no code changes;
The following tools were used
As the project had already been in production, we decided to perform all tests on the full copy of the production environment. That allowed us to keep real users unaffected while running tests and performing infrastructure experiments.
JMeter was installed on the dedicated server, as it required a lot of resources by itself.
As the testing was performed on the separate instance we had to generate some dummy data with additional services. Those services created Facebook test users, befriended them and created an activity on the users’ walls. The users’ tokens were saved to .csv file for further login use. Also they faked authorisation flow and users activity for other than Facebook social networks, as there was no possibility to create real test users.
We encountered problems
The first issue appeared right after the test start. It took a lot of time to complete the login requests: the more login requests, the longer it took for each of them to be completed. The source of the problem was an incorrect encryption algorithm used for the authorisation token. Despite one of the requirements not to change code, we decided to fix that critical issue.
During the load testing some other performance issues were found. They were related to DB performance and required a few stored procedures refactoring to eliminate deadlocks and also some additional table indexes.
Monitoring of the test results
All servers were monitored by Windows Performance Monitor. It had a set of performance counters for IIS, MSSQL and overall system state. Unfortunately there were no predefined performance counters for Windows Services, but it wasn’t a big deal, as you could add your own counters to the code, and then use them in Performance Monitor. Fortunately, they were already added to the code at development process.
In our case most useful counters were:
.NET CLR DATA – information on SQL connections pools
APP_POOL_WAS – IIS working processes
ASP.NET and ASP.NET Applications – Requests state counters:
Request Execution Time
Request Wait Time
Moreover, the counters that can be found under Memory and Process Information groups, can give you detailed information on how CPU and MEM are used.
Custom counters were mostly rate counters and used to control amount of incoming data, success/failure processing rate for that data.
List of all ASP.NET performance counters can be found here.
According to the results of the tests it was obvious that not only performance bottlenecks had to be fixed, but also the infrastructure and the configuration changes were required.
Therefore, we decided that the new infrastructure should be build to be capable to handle required load:
– 3 separate server instances for Web API all behind the load balancer;
– separate DBs to 3 different instances, with DB logs written to dedicated server;
We gave recommendations regarding the size of Amazon S2 instances.
After the new servers setup had been completed, the tests were run up again to see the final results.
New configuration was able to support 3.5-4K req/sec against Web API (7 login/sec) with avg req time 150ms.
Moreover, there was some room for improvement. There was a possibility to move each DB to it’s own instance, and to spread windows services to different servers and even duplicate them (our infrastructure supports it).
– while doing tests add your requests one by one and run short cycles, about 15-20 min, this will allow you to see if there is some critical bottleneck in each request;
– the longer you run your test the more accurate results you get;
– use .csv files for large amounts of data in JMeter (i.e. data used for login/registration: user email, pass etc);
– if you require some heavy scripting use JRS223 Sampler with enabled compiling instead of BeanShell script, or you even can go more extreme and create your own Java Sampler. This will significantly increase performance of the JMerter, and decrease it’s resources usage;
– pay attention to frequent requests, they are the ones that most probably can cause resources queues in API code or provoke DB locks;