.NET Garbage Collector Part III: Generations

Overview

CLR .NET Garbage collector implements a generational model in which garbage collected heap is partitioned into three different segments: Generation 0, Generation 1 and Generation 2. This generational model is in charge of optimizing collection performance by running partial garbage collections.

This model is build basis on two basic principles: young objects are expected to die quickly and old objects are expected to have a long life, in addition it´s also assumed that collecting pieces of the heap will be faster that collecting it entirely. It worth to mention that definitions of Young and Old regarding to .NET objects are directly related with the frequency of garbage collections that the application fires. In low frequency garbage collector scenarios one object that was created 15 seconds ago can be considered as young whereas this same object will be old under more frequency collector scenarios

Generation 0

This segment contains the youngest objects, all new created objects will be allocated here. When initialized, managed head doesn´t have any object allocated. As long as new objects are created will be allocated at generation 0. This memory-space is small and starts with a budget between 256KB-2MB, this is so in order to allow traversing this small amount of memory in a very little time but, on the other hand, this amount of memory is not enough for covering minimal memory-accommodation requirements, even for small applications. When CLR tries to allocate a new object and there is no enough memory at Generation 0 then a new garbage collection is fired inside Gen 0.

Then the GC will mark those objects that are unreachable and will compact and re-allocated the rest, these reachable roots will be moved to Gen 1 segment. It´s important to note that GC marks only objects that belongs to generation 0 during this early stage. GC at generation 0 is very efficient due to different reasons:

  1. Budget size of Gen 0 (as we commented early from 256KB to 4M, but this can be dynamically adjusted). As a consequence, check this small amount of memory is performed in a very litle time
  2. The most objects at Gen-0 have references to other Gen 0-objects so their memory are closed to each other in space
  3. It´s known that cache size determines the memory size of Gen 0, so this increase the likelihood that Gen0-objects to be found in the cache, this, in turn, increase the perfomance
  4. GC only run over the hot-surface, when GC is activated check if current occupied Gen-1 memory is near its memory threshold, if the current size is much less 2MB then the collection will only performed over Gen-0
  5. GC doesn´t traverse every object in Gen-0, if one object has a reference to an older object (that belongs to a previous generation) then GC ignores all older  inner references from this object optimizing the time to build the reachable object´s graph

All survivors will be promoted to Gen-1, in fact they will be copied from region at Gen-0 to a new region at Gen-1, this copying process is not expensive.

Generation 1

Generation-1 is where all promoted reachable-surviving objects from Gen-0 will be allocated after GC is performed. Can be considered as a buffer between Gen-0 and Gen-1. It´s usually larger that Gen-0 but still significantly smaller than entire available memory space. Its size usally varies from 512Kb to 4MB.

GC at Gen-1 is a partial garbage collection and it´s fired when Gen-1 becomes full, this is only triggered by a previous GC at Gen-0. This is commonly a cheap and quick process with a high efficiency degree, at this stage, all objects with finalizers that survived to first collection will be managed.

All objects that survives from Gen-1 collection will be promoted to Gen-2, this means that these objects will now be considered as old-objects.

In this line it worth to mention that generational model involves an attached risk because some temp-objects can achieve the Gen-2 memory for dying afterwards, this phonomenon is known as mid-life crisis and should be avoided as far as possible if we want to preserve memory health

Generation 2

This part of the heap is reserved for all these objects that have survived to previous collection at Gen-0 y Gen-1, and its size can encompass complete memory of OS process, that is to say, up 2GB for a 32-bit system and up to 8TB for 64-bits ones. In this line Gen-2 has specific memory-watermarks thresholds for firing GC at Gen-2. Commonly ranges for 32-bit systems is 16MB and from 64-bits from 128GB to 2GB

G-Collection over Gen-2 is considered as a full garbage collection and it takes much more time than previous generations to complete. In addition to that Gen-2 collection is several orders of magnitude less efficient than previous collections, at this point we have to keep in mind that collector is trying to reclaim memory from older objects that (according to generational model assumptions) should even live longer.

What about LOH

This is an specific area at heap (LOH) in which CLR allocates large objects, in this case, objects with 85KB or more size. It worth to mention that this threshold applies to the object and not to object related graph. Large objects are not touched by Gen-0, Gen-1 or Gen-2 collections due to performance reasons,  this way GC avoids to have to copy their memory between Gens areas, intead GC uses a different mechanic for large-objects. A simple linked-lis with all free memory blocks is created and when a new allocation is requested, CLR has to find the proper free block even breaking the free blocks in pieces, this means a violation of main principles of GC but necessary for preserving the performance.

GC process over LOH are strong related with Gen-2 collections, in general LOH will be collected when Gen-2 memory threshold in achieved, in the same way if LOH threshold is touched a new Gen-2 collector will be performed.