Sitecore web application won’t start

I needed to upgrade one of third party library in my Sitecore solution and I expect it would be as easy as entering a few commands through the package manager console in Visual Studio. Which sure enough it did and installed the new version in my projects and I can build successfully but to my surprise when I tried to browse the homepage it just keeps loading infinitely.

Curious on what could cause this, I tried to isolate the problems to the recent dlls changes that I did and sure enough it was because of that. But there’s no exception logged in Sitecore logs, the page didn’t return a runtime error either.

It took a while to figure this one out but essentially I found out that our Dependency Injection setup actually throws an exception (which isn’t logged anywhere) whenever it failed to load an assembly. The exception that was thrown was pretty generic and didn’t capture which assembly is failing to load.

I finally was able to isolate the problematic assemblies by using the Assembly Binding Log Viewer and update the registry to enable logging

In my case, the third party library that I was using has a dependency to a lower version of System.Net.Socket that I have which I managed to fix with a few assembly binding redirects.

Investigating Performance Problems using Memory Dumps

I recently had to brush up my WinDbg knowledge due to a performance issue that occurred in production environment.

Normally you don’t have to go to the memory dumps route to get an idea on what’s causing the performance bottleneck in your application, if you have an APM tool such as New Relic you would be able to tell the hotspots in your application – if you don’t have an APM tool then at a minimum you need to use Windows performance counters to gather metrics around hardware utilization (CPU, Memory, Disk I/O) and the ASP.NET performance counters.

I especially like the New Relic Thread Profiler feature, which provides a unique ability to run a profiler against your application.. even in production environment – no I didn’t made a mistake there. You can read more about it https://docs.newrelic.com/docs/apm/applications-menu/events/thread-profiler-tool

And oftentimes NR Thread Profiler analysis will provide me with the information that I need to pinpoint the application bottleneck and work with the team to come up with optimization plan.

But there are times where you just need more information on what’s going on with your application in a lot more detail manner, that extra information that would be able to help you validate your hypothesis. Enter the memory dump analysis..

Capturing memory dumps

There’s an existing Sitecore KB article that goes and explains how to capture memory dumps using different tools which you can refer to https://kb.sitecore.net/articles/488758

Memory dumps analysis

There’s a couple of tools that you can use to analyze your memory dump files, I’ll go through each one and explain what I use them for.

Debug Diag

This tool is very useful as it will find problematic areas of your application based on the memory dump file that you provided, you can also set multiple memory dump files which is taken from the same server consecutively to get better result on performance analysis.

Once you have the memory dump file available, what you want to do is to open it using Debug Diag, select the analysis rules and ran an analysis.

It will then generate a .mthml file that can be opened with Internet Explorer and when you open it you will able to see the summary of the analysis outcome and the stack trace of the offending threads.

WinDbg

WinDbg is the main tool in which you will spend most of your time doing memory dump analysis as it’s very powerful and has plenty of useful commands and extensions that can help you narrow down the cause of the problem.

There’s a new version of WinDbg called WinDbg preview available through the Windows Store which has more modern look to it, but the commands that I reference in this section should apply to the older version as well

The first thing that you want to do is to load the SOS extension

.loadby sos clr

And the Mex extension – which you can download at https://www.microsoft.com/en-us/download/details.aspx?id=53304

.load [fullPathToMexExtensionFolder]\mex.dll

And the Sosex extension – which you can download at http://www.stevestechspot.com/

.load [fullPathToSosexExtensionFolder]\sosex.dll

Useful commands

!threadpool

This command allows you to check CPU utilization when the memory dump was taken and the number of active worker threads and IOCP threads.

!runaway

This command will display all the threads that were running and order them by the execution time that it has taken until the memory dump was taken

~[threadId]s

This command allows you to set the context to the specified thread id which allows you to run commands against that thread

!clrstack

If the current thread execute a .NET managed code then it will dump the full stack trace – which is useful to help identify problematic code in your application

!do [address]

This command dumps the object information of the specified memory address

!syncblk

This command will display all the threads that own a lock, if there’s a thread that has a high MonitorHeld value then it suggests that you might have a deadlock or a lock contention issue. The value is either 0,1, or any odd numbers. For example, if the value is 89 then this thread owns a lock which is represented by a value of 1, and the remaining 88 count means that it is waited by other 44 threads – each waiting threads holds a value of 2

!sosex.dlk

This command will analyze whether you have a deadlock that occurred between the threads.

!mex.clrstack2

Similar to the !clrstack command above but better.

!mex.sqlcn

This command displays the information around .NET SQL Pool objects which is useful to find out how many SQL connections were open.

!mex.aspxpagesext

This command will display all the running and completed aspx http request that was processed during the time of the snapshot. You can sort the result by the time longest time it took so that you can isolate your area of performance problem.

Visual Studio

You can open your application source code files in Visual Studio and the memory dump file in order to be able to switch between the threads and see what line of code they are executing. I normally combine this with WinDbg where I use WinDbg to isolate problematic threads and have the convenience of Visual Studio to be able to explore the source code.

Jetbrains DotMemory

You can also open the memory dump with Jetbrains DotMemory to get an idea of what objects that was created by your application. You want to get a good idea of what’s normal for your application here, how many custom objects do you expect and whether there’s some anomaly that occurs.

If you’re investigating memory leak, you would spend more time investigating this area.

Sitecore Pipelines – Thread Safety

This might not be a widely common knowledge, but when creating your own Sitecore pipeline processors you need to make sure that your code is thread safe as by default the pipeline processors will be created as a singleton.

I’ve encountered it before a few years ago and recently stumbled upon it again. Thus this blog post serve as a reminder for my future self.

Take a look at the following example

When testing locally with a single user the issue won’t become apparent as there’s usually only a single request that will trigger the code block but given enough volume of incoming requests then you would start seeing a weird behavior where the result is being overridden by a different thread.

The reason why by default Sitecore choose to create the pipeline processor to be created as a Singleton most likely is due to performance, though I wish this was officially documented.

Another way to re-mediate the issue is by changing the object life cycle from Singleton to Transient, by doing this Sitecore will create a new instance of the processor class whenever the pipeline processor is executed.Ā  This can be done by setting the reusable property to false.

SUGCON ANZ 2019

A little more than a week ago the first SUGCON ANZ was held in Sydney and I had the opportunity to be one of the speakers.

There’s been few fantastic blog posts prior to this blog post which summarize the event quite well and have some references regarding the live event so I won’t be touching those. Instead, I’ll talk a little bit regarding the topic that I presented at this event.

My colleague Gitesh Shah and myself were co-presenting the topic “Accelerating your development team with Docker”. On the session we discussed about what Docker is and how you can run Sitecore in a Docker container in your local environment and see what developer experience looks like which involve deploying and debugging your code to Docker container.

We also included a quick demo on running SitecoreDocker containers in Kubernetes using Azure Kubernetes Service which currently has Windows container support in preview to give a glimpse of future possibilities to the audience.

I believe that with the current state, Docker is a great tool to use to streamline the developer experience around deploying and shipping code, even for Sitecore projects. If your organization hasn’t look into Docker before and how can it help your Sitecore projects development, I suggest to look into them now and start getting your development team familiarized with the the tool and how can it be applied to solve specific problems that you currently have.

There’s also been a great effort from the community around managing centralized Sitecore Docker repository in github which has all the Docker files to build Sitecore Docker images from version 8.2 to the current latest version of Sitecore. Although it’s not officially supported by Sitecore yet, the Sitecore community in Slack #docker channel will be able to help if you have any questions. Again, massive kudos to those folks that make the repository a reality.

My talk aside, it’s been a great experience to attend SUGCON ANZ 2019. I had a chance to meet different people I’ve worked with in the past 10 years. I had the chance to meet new people from the community who’s passionate about Sitecore and software engineering in general. I also got the chance to meet with the Sitecore Rockstars such as Alex Shyba, Akshay Sura, Mike Reynolds in person among other people.

I’d also like to thank AKQA for giving me the opportunity to attend SUGCON this year and come out of it with a great experience.

I’m looking forward to attend SUGCON next year, this time in Melbourne šŸ™‚

 

Decoupling feature activation with Sitecore Feature Toggles (Part 1)

I had an opportunity to present about Sitecore Feature Toggles at the Sitecore Melbourne meetup last week, possibly the largest of Sitecore Melbourne meetup I’ve seen so far. A few people have come up to me after the session saying that they found the session was useful and was keen to chat about the pros and cons of the approach. I thought I’d keep a brief summary of the key takeaways from my talk for future reference.

This will be part 1 of 2 of the Sitecore Feature Toggles series. For the 1st part I will cover about the various scenarios where the Feature Toggle design pattern have helped the organization and the 2nd part I will go into more detail around the Sitecore Feature toggle implementation.

During my talk, I’ve given real world scenarios and examples of how a simple design pattern that is Feature Toggle was able to help solve some of the challenges that we had to deal with software release management. The challenges includes both technical and organizational challenges, for example in order to handle the incoming visitor traffic and the business goals we had to deal with a large scale AzureĀ  infrastructure, distributed systems which uses event driven architecture pattern, complex Sitecore Commerce implementation, advanced custom search implementation and all other technical challenges whilst keeping first in mind about security, high availability, performance, analytics, and believe it or not code quality. There’s also organization challenges around managing and coordinating tens of scrum teams working together in unison to release new features out to the production environment in a short release cycle with the constraint of having those teams work across multiple countries, multiple timezone and cultures.

Feature Toggle driven development, that was the approach that we adopt to help some of the challenges that we faced where for every new feature that we develop we will also develop a feature toggle to give us more control over the feature activation. Let’s see what type of scenario has it helped us so far.

Avoiding long lived feature branch

Some of the common problem that we had was long lived feature branch, normally associated with a feature that requires infrastructure changes. The problem with long lived feature branch? merge conflicts!, by the time you worked in isolated to get your feature completed and merge your code changes you have to deal with tons of code changes that was introduced by the other teams. You will end up spending a lot of your time to fix those merge conflicts and perform regression tests which depending on the changes that the other teams have done, you might spent another few days to incorporate those changes – remember that other teams are working on new features as well, as you keep making small tested changes and merge your code early and often, you will less likely to encounter merge conflicts.

Avoiding dedicated test environments

Dedicated test environment are expensive to maintain, especially if you don’t have a fully automated process which you can use to spin up new test environment totally from scratch. Avoid them.

The topic around having a dedicated test environment in the organization normally comes up when a feature requires infrastructure changes. With feature toggles, you can manage the infrastructure changes to a certain extent so you don’t always need to have a dedicated test environment.

Again, spinning up new dedicated test environment and keeping them in sync with latest changes are expensive to maintain. Reduce the need to have one by using feature toggle.

Short Release Cycle

One of the challenges that the organization have is coordinating the software release across a large number of teams whilst making sure that everyone is working in unison towards the same deployment date in 2 weeks interval.

With feature flag driven development, the scrum teams is merging their code changes early and frequently, which in turn everyone’s feature that continuously being deployed to the same test environment for testing along with tens of other features that the other teams are working on.

This approach allows the scrum teams to work in unison towards the same deployment date.

Managing risks around feature activation

Having feature toggle associated with the new feature that a team is working on means having more control around the feature activation. You can use that feature toggle as a kill switch when the feature isn’t behaving as intended, for example in the scenario where a team in a different timezone is blocked because the feature that was deployed in the test environment is causing issues, rather than spending the rest of the day twiddling their thumbs and feeling frustrated because they can’t get any work done, they can instead disable the feature by using feature toggle which fallback to the old/default behavior which will allow them to continue working.

In production environment scenario, you can imagine this kill switch scenario being used to quickly mitigate any high risk incident related to the new features to help avoid commercial loss, or if it’s related to operational aspect of the software such as replacing an expensive algorithm that causes high CPU computation or memory usage, you can use Ops toggle to give the servers some breathing room whilst your team are looking into the problem.

Once you compare that approach versus doing a deployment rollback, or working late hours to come up with hotfix, you’d value the control that the feature toggle gives you.

Incremental and safe roll-out

Having a way to incrementally roll-out a feature is essential for business. This could from a various number of reason, for example we might want to incrementally roll-out a particular feature because we want to observe how it might impact the current infrastructure capacity in production environment to help drive a more accurate server capacity planning, another example might be because the client’s operational infrastructure is just not ready yet to handle their entire customer base and they want to trial it first with a specific segment of their customers hoping to get quick feedback and verification of their new feature.

Once the client is happy with the small roll-out they can decide to roll-out the feature for all of their customer base.

Aligning feature activation to marketing date

Let’s say that the client said to you that they want the feature to be activated at a certain date, coincidentally this date doesn’t exactly lined up with your deployment date. What do you do? adjust the deployment date? do a midnight deployment and spend the entire night working overtime which eventually cause your team to burn out?

With feature toggle, you can avoid this problem because you’ve essentially decoupled your software deployment with feature activation/release.

Pros and Cons

All the above scenarios have demonstrated the pros of having feature toggles usage, which I’ll listed again to summarize

  • Avoiding long lived feature branch
  • Avoiding dedicated test environment
  • Having short release cycle
  • Incremental and safe roll-out
  • Aligning feature activation to marketing date

With it also comes it’s cons

Increase on code complexity

Because the feature toggle essentially introduce a new code paths in your application, you would need to maintain at least 2 different code paths. This increases the code complexity in trade of for more control over the new feature that you’re building.

Increase on testing effort

Because of the different code paths associated with the feature toggle, that also means that you would need to put more effort on increasing code coverage to help cover the various scenarios associated with that feature toggle and all automated test tools that you have.

Increase on maintenance effort

Having too much of feature toggle in the application can be a daunting task to maintain. A feature toggle that is left lying in the codebase which no longer serve any purpose only decreases code maintainability and can bite you hard in the futureĀ 

With that, here’s some best practices around creating and maintaining feature toggles

  • Create a descriptive name for your feature toggle
  • Never re-use an old feature toggle
  • Avoid having feature toggle dependent on another feature toggle
  • Create a sunset policy for your feature toggle

In the future article, I’ll be sharing around the Sitecore Feature Toggle implementation.

Stay tune.

Revisiting SonarQube integration 6 months later

From my previous post on SonarQubeĀ it’s been more than 6 months and with a few down time here and there since then it’s given me enough time to think about and try out different integration scenarios with our current development workflow.

There are 2 main integration scenarios that I will recommend for you to try out

Continuous Code Quality Monitoring

In this integration scenario you run code analysis on your integration/release branch regularly to get feedback on every merged PRs to be able to identify early if there’s a detrimental effect to the current health of your code base.

I find the following metrics that I set as my Quality Gate in SonarQube to be particularly useful:

  • How much code coverage does this repository have? and what’s the minimum threshold should I set?
    • Some separation of concern layer is more valuable than others (hint: your Business Logic layer)
  • How much code coverage is provided on new codes?
    • I find this is helpful in more of a culture and habit situation where you want to nurture good habits and have a culture of striving for a good code quality from your development team
  • Set a threshold on the number of major and blocker issues in your project.

In some cases (I’m thinking of a startup environment or a product focused team) where the team actually considers these metrics to be one of the criterias if the code can be released, they will hook it up with a rotating red lights if the Quality Gate didn’t pass – pretty sure someone did this already šŸ™‚

Automated Bitbucket PR review comments

I find this a killer feature dependingĀ  on the complexity of the software you’re developing. If you’re working with multiple teams and multiple timezones, I find it very crucial to invest on implementing standard and convention through the means of documentation and regular catch up between the key members of the team to ensure architecture consistency and todo technical improvements.

Now imagine that through the use of static code analysis, you have the power to help enforce the standard through every new PRs that are raised with automated PR review comments from SonarQube.

SonarQube comes with a decent amount of out-of-the-box code rules which you can leverage and start using for your repository but when it truly shines is when you start building your own custom code analyzer and turn it into a SonarQube plugin.

With the custom code analyzer in place, every time one of your team members in the other side of the earth violates the agreed-upon coding standard, the custom code analyzer can pick it up and warn them, also provide them with either: a description of why it’s considered a violation or a link to your coding standard documentation page.

The PR reviewer will also have the assurance that SonarQube will run static code analysis to pick up the industry standard practices violation and also your project specific coding standard so you can focus on other areas in the PR

Think of the following custom code analyzers:

  • Sitecore common pitfall analyzers
    • Don’t use Sitecore fast query
    • Use Sitecore ID instead of field name to get field values
    • Don’t use Sitecore axes/descendant API
    • etc
  • Sitecore Helix architecture analyzers
    • Project cannot depend on another project
    • Feature cannot depend on another feature
    • Foundation cannot depend on Feature
    • etc
  • Client specific/project specific/repo specific code analyzers
    • Stop using “x” pattern in the codebase and instead use “y”
    • Define custom analyzer once and distribute it through SonarQube so every developers in your team are aware of it.

useful links:

Key takeaway

A few key takeaway from my musing on SonarQube and code quality.

  • The code quality metrics that SonarQube dashboard provided is a great metrics to use as a milestone to drive technical improvements
    • The code quality profile in SonarQube for a particular project needs to be regularly reviewed otherwise it’s just noises in the metrics which will never get resolved.
  • Custom code analyzer integrated with automated Bitbucket PR review comments are awesome
    • It’s a great automated way to enforce standard and make PR review process easier

Sitecore implementation strategies to mitigate production environment issues

I’ve found the following techniques proves to be very handy in ensuring a smooth website operation, especially if you’re running a mission critical site for example an online shopping site or other high traffic site.

Feature Toggles

Feature toggles is a software design pattern which enables you to modify the behavior of the software without modifying the code – https://martinfowler.com/articles/feature-toggles.html

It’s a simple technique yet ridiculously powerful if used correctly.

We’ve used feature toggles for different reasons:

Release toggles

You have a shiny new feature that you are working on in the current release that’s half finished and not yet ready to be included in the next deployment schedule but you don’t want to keep it in a long lived feature branch? – merge it in the main branch and use feature toggle to make it dormant.

Got to deploy that new header components but it’s not a 100% done yet? – use feature toggle.

Got a cool new Sitecore renderings which displays different look and feel and defined in a custom Sitecore device? – use feature toggle to control the roll out.

Ops toggles

Got a sudden spike in server resources grinding the server to a halt due to a sudden spike in traffic ? Ideally you should plan this out with the client to mitigate the sudden traffic increase which could come from a marketing department campaign activity – if not, turn off some of the non critical and resource intensive operation in the site using feature toggle to reduce the load.

Got a poor performing functionality in the site? I hope you’re doing some sort of performance test before deploying it in the first place so you can caught it earlierĀ  šŸ™‚ – if not, then I hope you implemented feature toggle to turn it off.

Implementation

no code example here, it’s very basic – didn’t I mention this is a very simple technique?

The core implementation that I’ve used is using Sitecore content item and item publish.

  • I have a multi site implementation
  • Each site has a feature toggles folder
  • Feature toggles folder contains different types of feature toggle Sitecore items, for example:
    • Enable new Header layout
    • Enable site notifications
    • Enable contact us email forwarding

Every feature toggles, contains the basic functionality, a single checkbox to define whether the functionality should be enabled/disabled. that’s it.

You can extend the functionality to also include IP white-listing, Country detection etc go nuts šŸ˜€

 

Sitecore Site Snippet

Another simple technique for mitigation is using Sitecore site snippet.

The site snippet I’m referring here is javascript and css hacks to modify the behavior or look and feel of the site on the browser. A sticky plaster fix if you like, it gives you the ability to fix the issue quickly without requiring file deployment and give you time to focus on doing a proper fix to be included in the next deployment schedule.

You can of course deploy a hotfix which contains js or css files which also works of course, if you have multiple Content Delivery servers make sure you have easy deployment process where you can just press a single button to get it done.. or use site snippet

Implementation

Site snippet that I’ve done is just a custom API endpoint which returns a js or css syntax

  • Define the API controller endpoints
    • /api/sitesnippet/js
    • /api/sitesnippet/css
  • I have multi site implementation
  • Each site has a Site Snippets folder
  • Site Snippets folder contains different types of snippet Sitecore items, for example
    • Hide newsletter component from footer (css)
    • Override click event from the subnav component (js)
  • Those site snippets would then be managed through a Site Snippet Configuration where admin can select which site snippet to be activated

 

CDN Cache Rules

If you’re using CDN such as CloudFlare, it can act as an Application Gateway where you can define custom rules to mitigate certain issues which occurs on your site.

Have you encountered some of these issues?

  • You have a public API endpoint which suddenly gets abused by malicious hacker
  • Certain API endpoints of your site is responding poorly and it’s affecting all the sites in the same server instance
  • etc

with CloudFlare you can block those endpoints using CF rules to prevent the issues from becoming a major disaster, providing you time to come up with proper planning to tackle the issue.

That’s all for now, let me know if you have some other techniques which you’d like to share.

Maintaining code quality with SonarQube

When working in a large solution of a project that’s been going on for years (Sitecore project or not), there’s bound to be technical debts here and there. Technical decisions which were taken in not an ideal condition which leads to shortcuts being made, not necessarily wrong decisions but the best decisions considering the situation at the time.

So how does one maintain and improve code quality with such code base? especially when people would come and go on the project and without fully understanding the code base as a whole they would perform the bare minimum to complete the work which usually ends up as moreĀ  technical debts which never gets pick-up as it gets buried deep within the code base as time goes.

There are several ways we can help reduce the on-going technical debts by:

  • Increasing our test coverage (unit test, functional test, integration test)
  • Doing PR review
  • Implementing the boy scout rule for every PR (leave the camp ground cleaner when you first arrived)

Been digging around for the last week seeking for ways to extract the health state of a code base and get some sort of stats for which we can then define as a baseline for improvement iterations. Which then I had a quick look at what tools available and gathered my notes on SonarQube.

SonarQube Dashboard Report

I first heard aboutĀ SonarQubeĀ in my old company 4 years back (was called Sonar back then). I never gave much thought of it back then but now decided to have a quick look into it and so far I like what I’m seeing and the capabilities it have.

One of the main reasons why I’m so keen with this tool is that it can provides the stats that I’m looking for from the current code base, the key thing to know if you’re doing some sort of improvements you would need to have concrete parameters. Another big reason is since the tools has been around for a while and matured, it has lots of plugins – including plugins which integrate nicely to our tooling and processes.

Here’s some of my notes so far on this tool which looks pretty cool.

Integration with the developer experience

  1. Integrate with Visual Studio
  2. Integrate with Resharper

 

Integration with JIRA

https://docs.sonarqube.org/display/PLUG/JIRA+Plugin

This plugin can create JIRA tickets based on SonarQube reports. At the time of writing this plugin has been labeled as deprecated though.

Integration with Team City

https://confluence.jetbrains.com/display/TW/SonarQube+Integration

This plugin offers the following features:

  • Easy SonarQube Runner deployment and configuration
  • SonarQube Server connections management
  • Test results locator. Works for Surefire and MSTest reports at the moment
  • JaCoCo coverage result locator
  • Navigation to SonarQube Server from the TeamCity UI: theĀ View in sonarĀ link on the Build Results page takes you to the sonar dashboard of the analysed project once the build is finished.

Now, you don’t really need this plugin if all you need is to run the SonarQube analysis runner and view the report in the SonarQube web application. All you need is just to execute the SonarQube analysis runner command which will generate the XML report files when you build your solution file and send it to the SonarQube server.

Integration with Bitbucket

https://marketplace.atlassian.com/plugins/ch.mibex.bitbucket.sonar/cloud/overview

  • Extract the health state of your code base

  • Highlight failed quality checks

  • Instant feedback in your PR – really cool I’d say šŸ™‚

 

So far it looks really promising,Ā  hopefully I can share more on this tool as I get some time to play around with it.

How TDS Git Delta Deploy make my life easier

In one of recent project that I’ve encountered most of the team members raise the concern about having a such long production deployment time, which could span the entire normal working hours and even more if they encountered some issues.

One of the thing that I noticed can be optimized immediately is the TDS package installation process, where instead of installing the full TDS packages which can contains thousands of items (in my case it was more than 12,000 items) we can instead use the option of only deploying the delta items – which for every release cycle (2 weeks) normally amount to around 50 items or even less.

The TDS Git Delta Deploy itself is something that Sitecore MVP Sean Holmesby created to answer how to have a true “delta” of content items to be deployed to the target environment. Have a read of the full article here.

So, I know the solution of one the paint point area then I just need to install Sean’s nuget package and go on my merry way right? well.. not exactly.

The problem

In the current solution the team have multiple TDS projects per features (Helix anyone?), those TDS projects would then be bundled using TDS package bundling to create the final combined package.

To give a little bit of illustration, see below:

Given the following TDS projects

  • TDS Foundation Z
  • TDS Feature A
  • TDS Feature B
  • TDS Feature C
  • TDS Project XĀ  (will bundle all content items from the above TDS projects)

So what’s the problem? the package bundling doesn’t respect the TDS Git Delta Deploy it will always include all the items instead of just the delta.

The solution

I reached out to the nice folks in the #tds slack channel inĀ https://sitecorechat.slack.com/ for suggestions – btw if you haven’t joined then I suggest you do that first else you’re missing out šŸ™‚

Not long after I raised the question, John Rappel replied that he has a fix for that in the latest nuget package – awesome!. I upgraded the nuget package and true enough that it now works. Read more about John’s bugfixes and feature updates here.

So now I got TDS delta packages working locally, it’s now time to integrate it in our Continuous Build process.

The setup in Team City

TDS Git Delta Deploy work by using “git diff” command in it’s core, which for this to work correctly you would need to ensure that:

A quick POC confirm that the approach is solid and I see those delta packages being generated!

TDS Delta packages generated.. what now?

With the TDS delta packages generated and ready to use, all we have to do now is to include it as part of our Continuous Deployment process. We’re using Octopus Deploy so it’s quite easy to setup.

You can use something like Sitecore Package DeployerĀ to automate the package installation process and include it as part of Octopus deploy step. In our case, since we already have an existing custom web service that does the same job so we leverage that instead.

quick tip:

Enable TDS “publish package after deploy” to streamline the deployment process by automatically publishing only the deployed items

here’s the reference on how to do just thatĀ https://hedgehogdevelopment.github.io/tds/chapter4.html

What’s the outcome?

Having all this in place really helped streamlined the way we do a Sitecore deployment and cut massive time which in turns faster deployment => more time to work on something else => productivity++

Sitecore logging integration with Graylog

With Sitecore infrastructure which expands beyond the basic 1 CM and 1 CD, having a centralized logging mechanism is crucial in order to understand what happens in the servers in a consolidated view.

Some of the popular APM out there in the market offers an automatic Ā records of exception/errors which occurred in the application, this proves really useful when investigating a problem down to the exact line of code which raised the exception. Some types of issues requires you to dig through the Sitecore logs in order to gain an understanding what happened in the server itself , for example when investigating publishing issues, security audits, index crawling, exm logs and fxm logs in the delivery servers.

For those types issues we typically look at different types of Sitecore logs in each servers in order to find out what exactly happened, if we have multiple servers in the picture we would need to go into each servers or pull down the log files from each servers and analyze the log files. This is where centralized logging for Sitecore logs will come in handy as we would have a centralized view of the log files which span multiple servers and we can perform a search query against the log information.

A couple of centralized Log provider in the market are:

  • https://azure.microsoft.com/en-us/services/application-insights/
  • https://www.splunk.com/
  • https://www.elastic.co/products/elasticsearch
  • https://www.graylog.org/

I stumble across Graylog in a recent project where we currently pushing windows event logs to it but not the Sitecore log information yet. So I played around with it a bit to get a feel *rolling sleeves*.

There’s a couple of steps that we need to figure out in order to integrate the Sitecore logs to Graylog

  1. Create a custom log appender
  2. Install Graylog server
  3. Configure Graylog server
  4. Send Sitecore log information to Graylog server

Create a custom log appender

Sitecore uses log4net as the logging framework which is extensible, by default it uses the LogFileAppender class which outputs the log information in text files.

As I want to send the Sitecore log information to Graylog server through http protocol, I would need to create a custom log appender. And as sending log information from Sitecore to Graylog server needs to be in a certain format, GELF format to be precise, we would need to format the log format to match GELF specification in order for Graylog server to understand and able to parse it.

I found theĀ gelf4net library which was mentioned in the Graylog documentation which already did all the heavy lifting of formatting the data. When I installed the nuget package and configured the log4net section in Sitecore according to this library documentation it doesn’t work though.

One gotcha that I found is that when we want to create a custom log appender in Sitecore , we need to reference the AppenderSkeleton class in the Sitecore.Logging assembly – previously I added log4net nuget package (comes with gelf4net as a dependency) and was hoping that would work, instead it failed miserably šŸ™

In the end I created my own appender class which replicate gelf4net appender implementation.Ā https://github.com/reyrahadian/sitecore-gelf-logappender/blob/master/ScGraylog/Appender/GelfHttpAppender.csĀ 

Install Graylog server

Having read through the Graylog documentation, the easiest and quickest way to setup a Graylog server for my POC is to download the VM. It’s all preconfigured, we just need to load the .voa file using VMware player or VirtualBox to get it up and running.

reference: http://docs.graylog.org/en/latest/pages/getting_started.html

Configure Graylog server

The next things that we need to do after we have our Graylog server up and running is to configure the input source. There’s multiple options that Graylog provides: http, tcp, udp or file dumps.

HTTP input source fits what I need for my POC, so I created a new HTTP input source and have it running.

Send Sitecore log information to Graylog server

Here’s where we put things together. With our custom log appender ready, now we only need to send the Sitecore log information to our Graylog server by using the following log4net configuration

Check if things works as expected

If everything works as expected then you should see some log information coming from Sitecore

 

source code: https://github.com/reyrahadian/sitecore-gelf-logappender