Practical Microservice Architecture and Implementation Considerations

Much has been written about microservice architecture as seen in blogs here ( and here ( This is a promising approach to deal with large scale integration of software so that it is easily maintainable, high performing, and scalable. Although the approach has some concrete foundation, there are some serious practical considerations that involve scalability and performance that one must take into account in order to ensure good SLAs. Anyone who has tuned high-end platforms understands that this approach is not straightforward.

Consider the industry over the last 15 years as I briefly review emerging software engineering trends that are important to give context to the move toward microservices. Today, microservices and functional programming trends are prominent in many software development discussions, where these concepts of software engineering patterns were born out of empirical findings and limitations of object oriented programming (OOP) and service oriented architectures (SOA).

Both OOP and SOA had challenges and failed to comprehensively address programming techniques that deal with agility of the codebase with minimal brittle effect to the scalability and performance of the runtime application kernel. (Developers like myself understand the meaning of “brittle effect.”) From a programming perspective we saw the rise of OOP largely through data abstraction, encapsulation, inheritance, and polymorphism. However, large-scale systems became difficult to develop, and with the famous debacle of the earlier EJB specification, a new concept emerged based on dependency injection / inversion-of-control (IoC), which was largely popularized by the Spring framework. This trend involved code that looked more like services, which were functional pieces of code that were well encapsulated to have a well-defined interface with sets of inputs and outputs, and the various service relationships and dependencies injected by the container. During this time we saw less emphasis on inheritance, but more on what functionality a developer is trying to deliver to the caller with scalability and maintainability. Likewise, we saw the concept of encapsulating the functionality into a well-defined set of services.

During the same time, the phenomenon of SOA was well popularized but there was not a clear and distinct best practice for a concrete implementation of this. Many implementations did succeed, but some were very difficult and failed. Some had services that were just too large and monolithic, while others had too many smallish services (almost microservices like) that it became difficult to achieve good performance. The concept of SOA was there, but designers and implementers failed to understand the full lifecycle of the service and its granularity and scalability impact on other services, and therefore paid a huge price during implementation.

It remains difficult to define what a service is, as this largely depends on the developer and how well they understand the end-to-end lifecycle of the service they are offering. Part of the problem is that the developer lacks full understanding of the underlying infrastructure services – for example, when to scale up or scale out, what the cost of scaling out is, and how it impacts response time. On the other hand, infrastructure service providers do not have insight into application components and how they should be best mapped.

Consider the following important practical aspects when building new microservices-capable platforms, or in some cases platform as a service (PaaS) deployments. Issues such as fragmented horizontal scalability, licensing considerations, performance, and code deployment flexibility.

1. Fragmented Horizontal Scalability

I came across a customer that had implemented the concept of microservices where they defined 25 REST services as part of their services layer. Front end web applications would connect to these REST services through a back end middleware/services layer to conduct transactions. After close examination of the call paradigm, we found that the vast majority of their transactions, approximately 90% of transactions would require traversal of all 25 REST services to complete an end-to-end business transaction. On the first day that the platform was launched they decided to have a one-to-one relationship between REST services and the JVM onto which it was deployed. Thus, RESTService1 was deployed on JVM1, and so on through RESTService25 deployed on JVM25.

Much of the influence of the one-to-one approach came from various writings of how microservices should be implemented, which unfortunately disregards the performance aspects and the overall total cost of ownership of the underlying infrastructure as well as licensing of the software components/containers.

On that first day when they launched the platform, they needed 25 REST services, and therefore 25 JVMs. Each of the JVMs had a heap size of 1 GB, so the total heap space across all JVMs was 25 GB. After being in operation for a few months they discovered that they needed to scale out by an additional 5 GB. Because the functionality of their scale-out architecture required a minimum of 25 JVMs (25 REST services) they were forced to add another 25 JVMs, or an additional 25 GB. Over time this pattern led to an environment made up of 400 JVMs, each with 1 GB heap, across 16 physical hosts.

Figure 1 shows on the right the 400 JVMs that constitute this platform across 16 physical hosts. The left hand side illustrates how the load balancer interacts with the 400 REST services. The green arrow shows the completion of an entire transaction within one host, while the blue arrows show the potential of bouncing to another host to complete a transaction. Clearly this many network jumps can hurt response time and overall performance. The paradigm almost always assumes that everything is remote and makes a remote service call, when in fact 90% of the time this is not the case.

Figure 1. Load Balancer Interaction with the 25 REST Services




Therefore, one issue is the need to deal with the ping pong effect on the completion of a transaction. One approach is to consolidate all of the REST services into one JVM heap space, and scale out to accommodate the 400 GB worth of heap. This naturally leads to larger JVMs, but it also has the distinct advantage that most or all of the calls will be local.


Figure 2. Highly Tuned High Performance Service


Figure 2 shows how we can deploy the 25 REST services all onto one JVM, and scale out based on how many JVMs we actually need to fulfill the 400 GB heap requirement. The basic setup is 2 VMs running on a host,  with 1 JVM per VM. In this deployment paradigm when a transaction reaches the JVM at RESTService1 the call continues all the way through to RESTService25, all within the same JVM and thus the same heap space. We tested this and found a 3x improvement in response time as compared with the old single REST service per JVM approach of Figure 1 (also shown at the right of Figure 2).

The VM sizes are no doubt NUMA optimized. I will write about NUMA optimization in an upcoming blog, but for now I show in Table 1 a deployment calculation for 400 GB and 800 GB in case there is a future need to extrapolate for traffic growth.

In this table we show how we collapsed the original 400 JVMs down to 12 JVMs of 34.56GB heap size.

Table 1. NUMA Optimized VM and JVM Size for 400 GB and 800 GB Platforms



In this approach we still had 12 JVMs, which is plenty of scale-out capability. The downside is that you have very large JVMs that can potentially lose more data than with the original paradigm. However during a close examination of the old fragmented system we found that this was a false assumption. In fact, many times a JVM would crash and take down a service, for example, RESTService3. If a transaction had completed its transit through RESTService1 and RESTService2, it sat hanging, waiting for RESTService3. The code did not have retry logic, which is not necessarily the best approach, but it can work. What we found is that although the remaining REST service JVMs were running (the remaining 24), they could not complete transactions, so they were indeed up, but useless. In the case of the refined deployment where we show all 25 REST services consolidated into one heap space, the entire transaction set can be rolled back cleanly, failed over, or made redundant. By contrast, if you try to do that with the fragmented scaled-out approach, you have to chase down which JVM and REST service is currently performing or holding up the transaction.

This comes with the need to understand how to tune and size large JVMs, which is an area where we have done much research with our customer base and have been producing and publishing the various approaches.

Figure 3 shows each section of the JVM and the VM with various sizes.



Figure 4 shows a snippet of the JVM options used, mostly using a combination GC of CMS in old generation and ParNewGC in young generation. In an upcoming blog I will explain each one of these JVM options in detail.

Figure 4. High Performance Service JVM with Various GC Options



2. Licensing Considerations

If you plan to proliferate microservices, note that each time you spin a new JVM there might be additional licensing costs – for example, if you need a license for each application server instance. In our case we were able to consolidate from 16 physical hosts down to 6. This had a direct impact on reducing the application server license cost, operating system license cost, and power consumption on the order of 60%.

Consider the practical limitations of licensing costs, as the majority of application servers are licensed by CPU cores, and having licensed a NUMA node or CPU socket you want to use up as much as possible of the memory attached to those CPU cores. Otherwise, you will have paid for licensing the usage of that memory without utilizing it. Although having larger JVMs will quickly get you the largest returns, you can also stack up multiple smaller JVMs instead. However in that case, every new JVM instance comes with its own overhead and needs additional CPU core cycles to fulfill the heavy GC cycles.

After a JVM has been launched and it has consumed the initial overhead, it continues to scale vertically very well without proportionally tracking an increase in CPU utilization. We specifically experimented with this using the GemFire in-memory database and noticed that a cluster of 8 very large JVMs would outperform a cluster of 30 smaller JVMs having the same heap size. We also conducted other performance studies that show the vertical scalability of JVMs using web applications, back in 2010.

3. Performance – How Much to Scale Up or Scale Out

More recently we tested this assumption yet again on a PaaS installation, comparing 2 JVMs of 2 GB each with 4 JVMs of 1 GB each. We found that the 2 JVM case outperformed the 4 JVM case by 26% with better response time and 60% less CPU utilization.

Figure 5 shows the response time chart. The blue lines represent Scenario-1 with 4 JVMs using 1 GB heap each as a scale out, while the red line is Scenario-2 of 2 JVMs using 2 GB heap each showing a good mix of scale up and scale out. In Figure 5 we see that Scenario-2 has 26% better response time and in Figure 6 we see that Scenario-2 has 60% less CPU utilization.

Figure 5. Response Time (y Axis) Compared with the Number of Test Iterations (x Axis)



Figure 6. CPU Utilization of Scenario-1 as Compared with Scenario-2



While scale out is an important attribute of Java applications, when to scale up or scale out is key to any successful PaaS platform. On the one end of the spectrum, if every component of your system was wrapped in a service, and each service mapped to a single JVM, you would quickly find that a small system of 1000 components turns into a system of 1000 very small JVMs. What was formerly a call to another method or function within the same heap space is now remote, and you contend with the efficiency of your network. Your network will never be as fast as an in-memory call. Even in cases where there is a call to localhost (assuming the neighboring service is running on the same operating system), it might now be load balanced to some microservice a few hops away where it would hurt response time.  This is why services, and the definition of services, must take into account the complete lifecycle of the service as driven by its transactional usage.

Transactions that traverse common services, all the time or during a majority of the time, must be as close to each other as possible, whereas if you assume that everything is remote in a soup of microservices hurts performance. Good PaaS platforms attempt to first provide a decent vertical heap size for your workload during incubation time (before you go live), and later allow you to adjust the configuration and decide when to scale up or scale out. In the background, the PaaS kernel attempts to vertically fill the NUMA socket of the physical server with either more JVM containers, if you chose to scale out, or with a smaller number of larger JVMs to the same net heap size.

Deciding between fewer larger JVMs and many smaller ones depends entirely on your workload. Mixing microservices of the same tenant onto the same JVM is relatively good practice as it allows for faster transactional response times. In the opposite case, mixing multitenant microservices would not be recommended because a multitenant JVM is not yet a mainstream approach. As we see multitenant JVMs become more common, we will start to see a shift in the market and in PaaS deployments towards larger JVMs, primarily because you can achieve greater performance at smaller infrastructure cost. The downside of course is that you have a larger JVM so if it crashes, you need to protect against the loss of a larger dataset or state.

PaaS platforms of the future should provide platform-based fault tolerance where deploying a service onto a JVM on a decent PaaS provides an option to have fault tolerance or redundancy. Of course this comes at a cost, and only those key state management services would be protected through this approach. PaaS platforms can certainly provide a mechanism that can copy the JVM memory state to a redundant copy at runtime to safety guard against failures. Whether this is done at the JVM, application container level, or VM level, or all of the above, is up to the PaaS platform engineer to decide. This is where cost, performance, and scalability drive a categorization of platform type dictated by the nature of the workload. In I discuss various categories of platforms, Category-1 web application based, and Category-2 in-memory database types of platforms. What you design for one might not be appropriate for the other.

4. Code Deployment and Flexibility Myths

The notion of independently deploying and updating a microservice independent of anything else around it has some practical limitations. Consider the soup of many microservices, perhaps multiple microservice instances of the same type. As an example, consider 3 main types, say 10 instances of Microservice1, 5 instances of Microservice 2, and 3 instances of Microservice 3. When you update a service definition, what happens to Microservice 1 and Microservice 2 while you are updating Microservice 3? How do you know which of the Microservice 1 and Microservice 2 instances must be quiesced, or told that the Microservice 3 instance they are about to access is going down? One answer is that it does not matter, but really this approach leads to a lot of transactional failures and its main issue is the fragmented complexity of not knowing the mesh graph of transactions.

In the case of the 25 REST services we discussed earlier, within the original model where every JVM had one type of REST service, the same problem existed of not being able to update a particular REST service type independent of the various other instances. Even if you could, it seems that you would have RESTService3 instance 1 with a newer version than the others. Perhaps you could hide this by not putting the updated service into the mix and rotate around and update the other instances in turn, but this requires a lot of coordination. Whereas in the case of having all of the REST services stacked onto one JVM, you can cleanly take out the entire JVM, and not that you would do this, but potentially you could have one JVM having an entirely newer REST service definition than the others, because the transactions do not cross to other JVMs.

In my lucky career I spent the first 12 years as a pure software engineer coding lots of applications in C++ and Java. Since 2005 I have been delving more into platforms and infrastructure, which has given me the unique opportunity to fill the gap that exists in the area between the code and the infrastructure, an area I like to call application runtime, or application kernel.

I will write more about this, but in the meantime I think microservices are promising. However, they should not immediately imply micro-application kernels or JVMs. In some cases, a 1:1 ratio of microservices to JVMs might be excessive, but in other cases it is warranted.

There is quite a bit more to address on this topic, much of which is discussed in my book,


Look for me at VMworld at this session:





My VMworld 2012 Sessions Covering Java


Speaker Biography: Emad Benjamin has been in the IT industry for the past twenty years. He graduated with a Bachelor of Electrical Engineering from the University of Wollongong. Earlier in his career, he was a C++ software engineer, then in 1997, he switched to programming with Java, and has been focusing on Java ever since. For the past seven years, his focus has been Java on VMware vSphere, vFabric GemFire and SQLFire. Emad has been at VMware since 2005, and is the author of the Enterprise Java Applications Architecture on VMware book.

Middleware virtualization has three key trends that we are observing with our customer base: 1) Consolidation, 2) Elasticity and Flexibility, and 3) Performance.  In this blog, we examine each one of these trends briefly and invite you to come along for technical deep dive at this year’s VMworld session APP-CAP1426 – The Benefits of Virtualization for Middleware to be presented in both San Francisco and Barcelona.


Many of our customers find that their middleware deployments have proliferated and are becoming an administrative challenge associated with higher costs.  We see a trend across customers who look to virtualization as a way of reducing the number of server instances.  At the same time, customers are taking the consolidation opportunity to rationalize the number of middleware components needed to service a particular load.  Middleware components most commonly run within a Java Virtual Machine (JVM) with an observed scale of 100 to 1000s of JVM instances and provide many opportunities for JVM instance consolidation.  Hence, middleware virtualization provides an opportunity to consolidate twice – once to consolidate server instances, and, secondly, to consolidate JVM instances.  This trend is far-reaching, because every IT shop on the planet is considering the cost savings of consolidation.

One customer in the hospitality sector went through the process of consolidating their server footprint and at the same time consolidated many smaller JVMs that were less than 1GB heap.  They consolidated many of these smaller 1GB JVMs into 2 categories, those that were 4GB, and others that were 6GB.  They performed the consolidation in such manner that the net total amount of RAM available to the application was equal to the original amount of RAM, but with fewer JVM instances.  They did all of this while improving performance and maintaining good SLAs.  They also reduced the cost of administration considerably due to the reduced number of JVM instances they had to manage, and refined environment that helped easily achieve SLA.

Another customer, in the insurance industry, was able to achieve the same as the above customer, but additionally was able to over-commit CPU in development and QA environments in order to save on third party software license costs.

On the other hand, sometimes we come across customers that have a legitimate business requirement to maintain one JVM for an application, and/or one JVM per a line of business.  In these cases, you cannot really consolidate the JVM instances, as that would cause intermixing of the lifecycle of one application from one line of business with another.  However, while such customers don’t benefit from eliminating additional JVM instances through JVM consolidation, they do benefit from more fully utilizing the available compute resource on the server hardware, that otherwise would have been underutilized in a non virtualized environment


Elasticity and Flexibility

It is increasingly common to find applications with seasonal demands. For example, many of our customers run various marketing campaigns that drive seasonal traffic towards their application.  With VMware, you can handle this kind of traffic burst, by automatically provisioning new virtual machines and middleware components when needed, and then automatically tear down these VMs when the load subsides.

In addition, the ability to change updating/patching hardware without causing outage is paramount for middleware that supports the cloud era scale and uptime.  VMware VMotion gives you the ability to move VMs around without needing to stop applicators and or the VM.  This flexibility alone makes virtualization of middleware worthwhile when managing large-scale middleware deployments. One customer in the financial space, handling millions of transactions per day, used VMotion quite often to schedule their hardware upgrades without any time downtime.  What otherwise would be a costly scheduled downtime to their business.


Customers often report improved middleware platform performance when virtualizing. Performance improvements are partly due to the updated hardware that customers will typically refresh during a virtualization project. There is also some performance improvement due to the robust VMware hypervisor.  A recent customer that reported a great level of performance provided the following testimony

With our OrderExpress project we upgraded our Middleware Services, Commerce, Portal, WCM, Service Layer, DB2 Database; migrated from AIX to Linux; virtualized on VMware; moved the application into a three-tier DMZ; increased our transactions by over 150 percent; and added significant new capabilities that greatly improved the customer experience. Changing such a wide range of technology components at once was a huge challenge. However using VMware vSphere and additional architectural changes we were successful in improving performance by over 300 percent; lowered costs in the millions; improved security, availability, and scalability; and how we plan to continue evolving this application to maintain greater than 30 percent yearly growth.”

– Jeff Battisti, Senior Enterprise Architect at Cardinal Health

In this session, I will show some actual JVM and VM sizes for middleware components both small and large JVMs.  I see the sizing question as the most commonly asked question with our customer base and hence, I plan to focus on it during the session.  Here is an example of a 4GB JVM with 5GB reservation:.


The second most common question is about GC tuning, now while GC behavior doesn’t change when virtualizing, it is something that our customers have asked us about and we have greatly assisted them in tuning for more optimal GC cycle.  I will show actual CMS and ParNewGC combination form an actual example:

I trust that by the end of this session you will be well equipped to tackle sizing, best practices, and JVM tuning.  In fact the answer to the below question will become apparent at the end of this session, which is, “Given 4vCPUs to configure with which below configuration will yield the best performance and scalability?”

Thank you for reading!

Here are my VMworld sessions for this year

APP-BCA1247- Virtualizing Large Scale Java Apps at Cardinal Health (Tuesday, Aug 28, 2:00 PM Room 3009/3011)

APP-CAP1426 – The Benefits of Virtualization for Middleware (Thursday, Aug 30, 10:30 AM Room 3010/3012)

I plan to extend the talks this year into a deeper dive about how to size JVMs and VMs for small, medium, and large-scale deployments.  I will also have a section on Java garbage collection tuning.

See you there!

Emad Benjamin.

VMJava Launch

This site is dedicated for all things concerning running Java workloads in a VMware environment.  Posts to follow will discuss various Java tuning techniques pertaining to physical environments and that are equally applicable to Virtualized Java workloads.

I finally had time to put this page together after finishing the work on the book that I have just released.  The book is entitled Enterprise Java Applications Architecture on VMware, and can be obtained (internationally) from createspace:

I wanted to take this opportunity to thank Steve Jin (Author of VI Java API) for his blog on my book, here


The book contains the following chapters:
Chapter 1: Introduction
Chapter 2: Why Virtualize Enterprise Java Applications?
Chapter 3: Enterprise Java Applications on VMware
Chapter 4: Design and Sizing of Enterprise Java on VMware
Chapter 5: High-availability Designs of Enterprise Java
Chapter 6: Enterprise Java on VMware Best Practices
Chapter 7: UNIX-to-Linux Migration Considerations
Chapter 8: Run Effectively in Production
Chapter 9: Performance Study
Chapter 10: Application Modernization and vFabric
Chapter 11: Troubleshooting Primer
Chapter 12: FAQ—Enterprise Java Applications on vSphere