I had an opportunity to present about Sitecore Feature Toggles at the Sitecore Melbourne meetup last week, possibly the largest of Sitecore Melbourne meetup I’ve seen so far. A few people have come up to me after the session saying that they found the session was useful and was keen to chat about the pros and cons of the approach. I thought I’d keep a brief summary of the key takeaways from my talk for future reference.
This will be part 1 of 2 of the Sitecore Feature Toggles series. For the 1st part I will cover about the various scenarios where the Feature Toggle design pattern have helped the organization and the 2nd part I will go into more detail around the Sitecore Feature toggle implementation.
During my talk, I’ve given real world scenarios and examples of how a simple design pattern that is Feature Toggle was able to help solve some of the challenges that we had to deal with software release management. The challenges includes both technical and organizational challenges, for example in order to handle the incoming visitor traffic and the business goals we had to deal with a large scale Azure infrastructure, distributed systems which uses event driven architecture pattern, complex Sitecore Commerce implementation, advanced custom search implementation and all other technical challenges whilst keeping first in mind about security, high availability, performance, analytics, and believe it or not code quality. There’s also organization challenges around managing and coordinating tens of scrum teams working together in unison to release new features out to the production environment in a short release cycle with the constraint of having those teams work across multiple countries, multiple timezone and cultures.
Feature Toggle driven development, that was the approach that we adopt to help some of the challenges that we faced where for every new feature that we develop we will also develop a feature toggle to give us more control over the feature activation. Let’s see what type of scenario has it helped us so far.
Avoiding long lived feature branch
Some of the common problem that we had was long lived feature branch, normally associated with a feature that requires infrastructure changes. The problem with long lived feature branch? merge conflicts!, by the time you worked in isolated to get your feature completed and merge your code changes you have to deal with tons of code changes that was introduced by the other teams. You will end up spending a lot of your time to fix those merge conflicts and perform regression tests which depending on the changes that the other teams have done, you might spent another few days to incorporate those changes – remember that other teams are working on new features as well, as you keep making small tested changes and merge your code early and often, you will less likely to encounter merge conflicts.
Avoiding dedicated test environments
Dedicated test environment are expensive to maintain, especially if you don’t have a fully automated process which you can use to spin up new test environment totally from scratch. Avoid them.
The topic around having a dedicated test environment in the organization normally comes up when a feature requires infrastructure changes. With feature toggles, you can manage the infrastructure changes to a certain extent so you don’t always need to have a dedicated test environment.
Again, spinning up new dedicated test environment and keeping them in sync with latest changes are expensive to maintain. Reduce the need to have one by using feature toggle.
Short Release Cycle
One of the challenges that the organization have is coordinating the software release across a large number of teams whilst making sure that everyone is working in unison towards the same deployment date in 2 weeks interval.
With feature flag driven development, the scrum teams is merging their code changes early and frequently, which in turn everyone’s feature that continuously being deployed to the same test environment for testing along with tens of other features that the other teams are working on.
This approach allows the scrum teams to work in unison towards the same deployment date.
Managing risks around feature activation
Having feature toggle associated with the new feature that a team is working on means having more control around the feature activation. You can use that feature toggle as a kill switch when the feature isn’t behaving as intended, for example in the scenario where a team in a different timezone is blocked because the feature that was deployed in the test environment is causing issues, rather than spending the rest of the day twiddling their thumbs and feeling frustrated because they can’t get any work done, they can instead disable the feature by using feature toggle which fallback to the old/default behavior which will allow them to continue working.
In production environment scenario, you can imagine this kill switch scenario being used to quickly mitigate any high risk incident related to the new features to help avoid commercial loss, or if it’s related to operational aspect of the software such as replacing an expensive algorithm that causes high CPU computation or memory usage, you can use Ops toggle to give the servers some breathing room whilst your team are looking into the problem.
Once you compare that approach versus doing a deployment rollback, or working late hours to come up with hotfix, you’d value the control that the feature toggle gives you.
Incremental and safe roll-out
Having a way to incrementally roll-out a feature is essential for business. This could from a various number of reason, for example we might want to incrementally roll-out a particular feature because we want to observe how it might impact the current infrastructure capacity in production environment to help drive a more accurate server capacity planning, another example might be because the client’s operational infrastructure is just not ready yet to handle their entire customer base and they want to trial it first with a specific segment of their customers hoping to get quick feedback and verification of their new feature.
Once the client is happy with the small roll-out they can decide to roll-out the feature for all of their customer base.
Aligning feature activation to marketing date
Let’s say that the client said to you that they want the feature to be activated at a certain date, coincidentally this date doesn’t exactly lined up with your deployment date. What do you do? adjust the deployment date? do a midnight deployment and spend the entire night working overtime which eventually cause your team to burn out?
With feature toggle, you can avoid this problem because you’ve essentially decoupled your software deployment with feature activation/release.
Pros and Cons
All the above scenarios have demonstrated the pros of having feature toggles usage, which I’ll listed again to summarize
- Avoiding long lived feature branch
- Avoiding dedicated test environment
- Having short release cycle
- Incremental and safe roll-out
- Aligning feature activation to marketing date
With it also comes it’s cons
Increase on code complexity
Because the feature toggle essentially introduce a new code paths in your application, you would need to maintain at least 2 different code paths. This increases the code complexity in trade of for more control over the new feature that you’re building.
Increase on testing effort
Because of the different code paths associated with the feature toggle, that also means that you would need to put more effort on increasing code coverage to help cover the various scenarios associated with that feature toggle and all automated test tools that you have.
Increase on maintenance effort
Having too much of feature toggle in the application can be a daunting task to maintain. A feature toggle that is left lying in the codebase which no longer serve any purpose only decreases code maintainability and can bite you hard in the future
With that, here’s some best practices around creating and maintaining feature toggles
- Create a descriptive name for your feature toggle
- Never re-use an old feature toggle
- Avoid having feature toggle dependent on another feature toggle
- Create a sunset policy for your feature toggle
In the future article, I’ll be sharing around the Sitecore Feature Toggle implementation.