Vukoje.NET

blog about software development

Thoughts on logger lib design

Last year I gave a talk State of the art logging where I discussed logging, logging libraries and tools for logs analysis. I wanted to go a step back and share with you my notes on designing a logging library. I actually written down what was important for me and later decided to go with custom logger implementation that wraps NLog.

When I talk about logging I always tell a difference between tracing and event logging. By tracing I mean logs with high detail data useful for debugging app. Trace is available only in “tracing mode” – trace data is not collected by default. By event logging I mean data that application outputs to some persistent store and which are used for monitoring application behavior (e.g. crash reports).

 

Tracing

Tracing must have high performance:

  • It should have no performance impact if not turned on
  • String formatting should not be done if tracing is turned on (Tracing client code never formats string because formatting consumes time and it might not be needed)
  • It should have very small memory consumption when tracing is turned off.
  • When turned on it must not endanger app pool execution (e.g. production tracing must not cause out of memory exception).

Tracing should be enabled per session in web applications (multitenant or not) but should also work in cases where session doesn’t exist (tracing win services, tracing web applications before session initialization).

Tracing should support priority levels and labels (categories).

Tracing should support “Start – End” style logs with using statement. This will make tracing API better (smaller and more readable code) and make performance monitoring easier (time of method execution). It should also enable hierarchy displaying of trace data (similar to Call stacks in debugging).

If Tracing component uses external components this must be hidden (our trace will wrap them) because otherwise it would introduce high coupling to that external component and make any changes very hard.

Trace should support logging to file. File system organization for web multitenant applications should be like this: Root folder/tenant name/user name/date/request time stamp.log Programmer should be able to configure simpler folder structure in simpler cases (Windows service that doesn’t have tenant or user). Root folder should be configurable. If session is not yet initialized or is lost, files should be logged to Root folder/tenant name/.

In web applications it should be possible to trace each web request in separate trace file. In other cases programmer should be able to decide when to split file (per day, month, never…).

Trace should have buffer of last N relevant trace calls (probably only logs with highest priority). N constant should be possible to configure by programmers. Beside this buffer of last N relevant logs we might keep full trace of last HTTP request. This can be useful when tracing is not turned on and application breaks. In this case we might still want full trace form request that broke the app. This solution might compromise memory so we must be careful with it.

Trace files older by some period should be deleted in some time intervals.

Tracing must not break in production! Breaking is only ok in debug mode. E.g. if programmer supplies trace with invalid parameters, Trace should break only in debug mode.

 

Event logging

Event log should contain information about events in application that are of interest to people monitoring application health and configuration status.

Event log should support Event log types:

  • Breaking error
  • Handled error
  • Info
  • Warning

It should be possible to search and filer event log data. E.g. show me today’s breaking errors or show me yesterday information containing text “salesforce.com sync started”.

Event logs should contain enough useful data for understanding event details. E.g. braking errors should contain following data:

  • User name or logged in user
  • Application name (multiple applications might be logging events to same DB)
  • Exception data (call stack…)
  • Http request data
  • Trace buffer (what did User do before error – useful for extracting reproduction steps)

Events should be logged to DB and files. Logging to file is needed for:

  • applications that don’t have DB
  • logging DB access errors (e.g. invalid DB connection string)

Event logging must not break in production! Breaking is only ok in debug mode. E.g. if event log DB is not accessible, Event logger should not break.

It would be nice to have feature where Event Logs can be updated by support team with comment (e.g. “Known issue”, “Bug 123 created”, ”Resolved”…).

Event logging should not be transactional. If Event is being logged in some outer transaction and that transaction is rolled back, the event is still logged.

My badass list of JavaScript resources

About a year ago I finally started learning JavaScript so I could build complex, fast and fancy client web app that relies on restful services. I was completely new to this subject so I basically needed to learn everything from scratch… the language, best practices, architecture…

So here are resources I used for learning and libraries that helped me along the way.

JavaScript basics

First the language and most popular lib – jQuery.

Knockout

Knockout is great lib for MVVM that does client rendering and data binding.

Bootstrap

Grate CSS framework with fancy controls that supports responsive UI.

Controls

Additional controls that integrate nicely with Bootstrap.

Libs

Don’t make me think–notes from book

In effort to teach myself more about web design I have read “Don’t make me think” by Steve Krug. I recommend the book to everyone interested in the same subject. This book seams as a good starting point and additionally it is small and is being read very fast. If you are planning to read this book maybe it is best to wait for the third edition that will be out soon. Update – third edition is out.

So here are my most important notes I am sharing so I can look them up later.

  • Webpage should probably have following elements that should be easy to spot:
    • Site ID/Logo and tagline – Indicates what site is this? It should be clickable and lead to home page.
    • Site Sections – Indicates what are the major sections of this site? Current sections should be highlighted.
    • Site Search – Especially important for users that prefer searching to browsing
    • Utilities – Less important common link like login, home, how to by…
    • Page name – Clearly informs user on what page are currently.
    • Local (contextual) navigation - Show what are user options on current page?
    • Breadcrumbs – Clearly explaining where user is in site hierarchy and offering easy way to go back to some upper level.
  • Home page is different from the rest of the site and it should be designed more like billboard or cover of the magazine. For some visitors home page will be only chance to leave good impression about your site.
  • Home page common elements:
    • Site ID/Logo and tagline
    • Site Sections – to give an overview of what site has to offer.
    • Site Search
    • Welcome blurb – short description of site that should be visible without scrolling.
    • Teases – hints of good site good stuff (top stories, hot deals, most popular pages…)
    • Timely content – like recent news/comments to signal that site is live and frequently updated.
    • Start here – It should be clear to user how to begin using the site (e.g. user registration)
  • Things that are related logically should be related visually
  • Make it obvious what is clickable.
  • Omit needles words. Design web pages for scanning, not reading.
  • Good design makes web site obvious to users and it doesn’t requires them to think too much how to use it.
  • Test usability of your site at least once a month.

Batched by default

Few weeks ago I was listening to a lecturer explaining how Salesforce works from programmer perspective. Among other things I have noted that SalesForce cares very much about their performance. They care so much that they have forbidden sequential access to the database. This is also why database access API are all designed to work with batches.

Few day ago I was writing MS SQL triggers for the first time after long time. Again I have noted that API for triggers is batched meaning that trigger is called only once after all rows in SQL statement are inserted/updated/deleted. Once trigger is called programmer can access virtual tables containing all altered rows. Off course alternative approach (not batched) would be to call trigger after each row update and this would lead to much poor performance.

Today I have analyzed some slow peace of code that is slow because it is sequentially loading many complex objects from database. At first there was a need to load only single complex object and that is what programmers implemented without support for batching (loading multiple objects at once). Once there was a need to load more objects programmers again did simplest thing - they just run single loader in a loop. Both choices sound logical to me but they lead to some slow peace of code that won’t be optimized without significant effort.

This lead me to thought that all APIs should be batched by default. This will lead to faster code and just slightly less convenient API in cases where we work with one object.

 

Update:

My friend Andrija Cacanović reminded me that JQuery’s fluent API is nice example of batched API.

 

Sinergija 2013 presentation material

You can download (852.96 kb) or view online power point file from my Sate of the art logging presentation.

Below is video demonstrating Webcom GUI for log analysis and system monitoring.

Here is source code of other presentation demo that uses Glimpse, NLog, Autofac, Castle Dynamic Proxies and some of my NLog extensions to demonstrate some cool stuff. Remember to restore NuGet packages if Visual Studio doesn’t do so on build.

Here are some more useful resources used for presentation:

 

Speaking at Sinergija 2013

SinergijaLogo

I will be speaking at this year’s Sinergija.

Presentation is titled State of the art logging and here are my incomplete notes for it:

  • How to build state of the art production system instrumenting so you can easily understand what is going on in your production system?
  • Which component to use? Log4Net, NLog, System.Trace, custom solution…
  • Event vs. Trace (Monitoring vs. Performance)
  • Negative events
  • Tracing steps with “using” notation (e.g. MiniProfiler)
  • AOP vs. manual logging
  • Where to output data?
  • How to scope data?
  • How not to kill app performance?
  • How to bake in app profiler in your tracing lib?
  • Demo of Webcom powerful GUI for trace analytics system.
  • Demo of AOP logging in MVC web app using NLog.

Goodbye Soprex

Few months ago I have left Soprex and joined Webcom. I have been working in Soprex  for more than 5 years and in those years I worked with some amazing people with whom I built some amazing apps I’m proud of:

  • B5 (Big Five) - Multi-portal web application that hosts multiple sites with following major features: ePortal, eCatalogue, eShop and eStock. EPortal is CMS system that can host multiple sites which can share pages. ECatalogue allows web portals to expose Product catalog to site visitors. ECatalogue also allows product and product groups sharing among web portals. EShop allows site visitors to order products which availability is maintained through eStock.
  • Soprex Core Applicaton (SCA) - Framework for building enterprise applications written in .NET. Among many things it includes implementation of domain objects, micro-ORM, code profiler, custom MVP implementation that can run Windows and Web applications.
  • Price and Quotation (P&Q) - Application for Quoting and Ordering built for SKF’s distributor network. Application is based on SCA and includes identical Windows and Web GUI. P&Q implements very complex pricing business rules that on the other hand allow user maximum flexibility in creating offers for customers.

 

Below is photo of my favorite moment in career when our customer form Germany sent us cookies to express its gratitude for all the hard work we did for them.

01122011101

Love you guys… make tons of money and give me some of it :D

About data caching

In my Code Project article “Demystifying concurrent lazy load pattern” I have explained what common mistakes in lazy load implementations and how to implement fast concurrent lazy load cache. In this post I will discuss why caching itself is important and what are some implementation strategies for caching.

 

Why caching?

 

Simply put... performance! Usual business application relies on lots of data, and that data usually resides in a database. Because data is out of the application thread, accessing it can be much slower. If external data is on a hard drive (and it usually is), accessing it can be thousands of times slower than accessing data in application memory. The great thing about cached data is that not only it will be few thousand times faster, but it will also scale much better under heavy load than your poor database.

Nowadays, cached data is becoming more and more important with the emergence of high load web sites with many concurrent users. In addition, we see the rise of distributed key-value databases acting as a shared cache.

In our application, we have increased the speed of a critical business feature, Order lines import, for about 50 times after data caching. Speed went from 1 row per second to 50 rows per second.

Sounds great, and best of all it is easy and it just takes a little caring about your data. Unfortunately, I have seen many problematic implementations of caching that will either occasionally break in production or have very low performance and those implementations are the main reason for writing this article.

 

Caching strategies

 

There are several choices to make when implementing a data cache. Some of the facts that could affect these decisions are:

  • The size of the data.
  • How and how often is the data used?
  • How and how often is the data changed?
  • Can eventual consistency (stale data) be tolerated?

 

I will give my favorite example. Content Management System (CMS) sites are usually completely dynamic, which means that site menus and page content are created from some model usually defined in a database. This data is rarely changed and it is accessed very often. Imagine this site has a thousand of concurrent users and a busy administrator adding new pages every day. This might seem like data is changed often, but in fact data will be accessed millions of times before it is changed. Once in a day might not seem rare, but measured in processor ticks, it is millions of years. Also, this data is very small which means caching it will not be a big impact to server memory and if some user does not see new pages for a few minutes it is not that big deal.


Data loading

 

In my lazy load pattern article, I was loading all Products at once when building a cache. An alternative to this approach is to load single Product once it is requested, if it is not already loaded in cache. Choosing the right implementation depends on the size of the cached data and its usage pattern.

As I mentioned earlier, Order lines import is a critical process in our application. In this process, we are accessing Supplier data for each Order line that is being imported and there can be few thousand lines being imported. This means we will be accessing the Supplier DB table a few thousand times per process. On the other hand, the Supplier table in DB is quite big and loading it into memory would take too much time and memory. It turns out that in practice, for one importing process (and all lines being imported in it) only a few Suppliers are needed, so a single Supplier caching improved our performance significantly with just 10 lines of code.

 

Data accessing

 

Once you load data in cache, how do you access it? Here are some of the alternatives:

  • Sequentially - by iterating through the whole list of Products and looking for matching Product
  • Directly - by building some kind of memory index. For example, once we have loaded Products we could have built a hash list in which key is Product Name and value is pointer to Product instance with that name. This would allow us direct access to the Product instance by using its name.
  • Combined - by splitting our Product list into smaller lists by some criteria. For example, we access the list of Products with some Product Type directly using a memory index and then iterate over it. Combined access is great choice when you have various criteria for data searching and direct access cannot be applied or preparing all memory indexes would be to time or memory consuming.

 

When choosing your data access approach you have to weigh the performance vs. complexity increase. Here is an example of a performance gain we got after refactoring our Price Calculator cache:

  • Sequential access - 47ms
  • Combined - 0.02ms

 

Sometimes it is important for some application logic to work with the data from one point in time because otherwise inconsistent data might appear in the application. If this is the case, logic executing over cached data must be aware that the cache might reload at any time. 

For example, we have a rule that our database contains product A or product B, one of them must exist in the database. Let us say that at time t0 DB contains only product B, and at time t1 a transaction occurs in DB that adds product A and removes product B. 

Below is the code demonstrating the potential problem in this case. 

 

Code sample 1: Inconsistent data cache

// t0: Product A doesn’t exist
If (!this.HasA(Cache.GetProducts()))
{
    // t1: cache is reset (A added, B removed)
    // Product B doesn’t exist so else branch is executed        
    If (this.HasB(Cache.GetProducts()))
    {
        this.UseB(Cache.GetProducts());
    }
    else
    {
        // we reach this part of code which means that
        // DB doesn’t contain product A nor B
        // and this case is not possible
    }
}

The solution is again simple. We store cache in a local variable to make sure we work with data from one point in time, when this is important.

 

Code sample 2: Consistent data cache

// t0: Product A doesn’t exist 
List<Product> localProduts = Cache.GetProducts();
If (!this.HasA(localProduts))
{
    // t1: cache is reset (A added, B removed)
    // but it doesn’t affect our local variable
    if (this.HasB(localProduts))
    {
        this.UseB(localProduts);
    }
}

 

Cache scope

 

It is important to understand who is sharing cached data because this fact influences the cache implementation. The goal is also to narrow down the scope as much as possible to gain naturally partitioned data that will have a lower data load and less concurrent reads.

Here are the cache scopes I could think of:

  • Multiple applications - Most complex scenario where data is shared by users of multiple applications. Consider using third party solutions for distributed key value stores to handle complexity of distributed cache. Consider having separate cache for each application to avoid the complexity of a distributed cache. Potential problem of this approach could be inconsistent data across applications or too big data load.
  • Single application - Data is shared by users of a single application
  • Database - If you have single application that connects to different databases when a user logs in, then you should probably cache data per database instance and not per application instance.
  • Session - Usually implies user specific data. Nice thing about this cache is that if you store it in application session store, it will naturally expire once the session expires (user logs out or is inactive for some time)
  • Use case - in this case, cache is usually reset once form is closed

 

Resetting cache


How?

 

Usually we can reset cache by reloading data or setting it to null and letting it lazy load of first data request.

In our examples, we have implemented cache resetting by setting cached Product list to null. 

There are few reasons why I do not recommending reloading cache data on cache resetting:

  1. It is slower
  2. Loading logic is duplicated
  3. It is a waste of resources. Instead of loading data as needed, we are always loading it.

 

When?

 

The easiest caching scenario is the one where only the application implementing the cache alters the cached data in DB. In this case, the cache should be reset once data is changed by the application.

If the data is changed by some other application, then caching becomes trickier. You can consider caching data for some time interval if you can tolerate inconsistent data in between. For example, can we tolerate that our product list is refreshed every 30 minutes and can contain stale data?

In our application, we have a service synchronizing our database with external data sources once or few times a day. Once this service is done with synchronization, it triggers a web service to notify the application that synchronization is done. We use this notification to reset our cached data that might have been updated by service.

There were several attempts in .NET to have a cache that will reset automatically once some table in MS SQL Server changes. We tried this approach few years ago and it caused us many headaches. 

 

Conclusion

 

Caching is something you will probably need in your applications, although caching is not perfect and has its downsides. One of the downsides is increased application complexity. You should always weigh complexity gains versus performance gains and add caching only if it significantly increases your performance or scalability.

Do not underestimate power of a DB querying engine and DB indexes and use that as an excuse for caching everything. If your DB data access is slow maybe you should double check DB indexes. If you are sure you need the full edition of Oracle to have good DB performance, check indexes again. Bear in mind that you do not want to end up building your own in memory querying language because you will fail. DBs are better options than cache for large data and for complex data matching criteria.

In our application, we have around ~500.000 Products in the DB, and guess what? We are not caching them because cache building would take too much time and memory. In addition, we have some advanced Product search queries in T-SQL we did not want to code in C#. By the way, once we “optimized” our critical Product search DB queries we got average executions of 1s. Once we dug into the indexes and really optimized queries, we got average 0,01s execution.