Memory allocation summary

A good understanding of GC can help us better understand the design of .NET and why .NET Core will have a greater advantage in cloud native development. This is also a process that a programmer needs to go through to a higher level. In the process of understanding GC, let’s take a look at the general knowledge of memory allocation in .NET. .NET allocates memory, mainly based on managed resources and unmanaged resources. Managed resources are allocated to the managed heap and managed by the CLR, which is allocated to the unmanaged heap. This section focuses on the allocation of managed resources. The CLR supports two basic types: value types and reference types. The CLR has two distribution methods for these two types at runtime:

It should be noted that the CLR also maintains a pointer called NextObjPtr, which points to the allocation location of the next object in the heap. At initialization, NextObjPtr is set to the base address of the address space area. After an area is filled with non-spam objects, the CLR allocates more areas and the pointers are constantly offset. The new operator returns a reference to the object. Just before returning the reference, the value of the NextObjPtr pointer is added to the number of bytes occupied by the object to get a new value, which is the address of the next object when it is placed in the managed heap.



Garbage collection algorithm and GC operation mechanism

Commonly used garbage collection algorithms mainly include reference counting algorithms and reference tracking algorithms. Reference calculations have obvious drawbacks. The garbage collection algorithm used by .NET is the reference tracking method. Small note: Regarding the garbage collection algorithm, I remember that there is a knowledge point. If there is a circular reference in C#, will it cause memory overflow? If you understand the two algorithms, you will know that they will not overflow.

GC Root

The reference tracking algorithm uses a series of GCRoot objects as a starting point to search down from these points. The search path becomes a reference chain. When an object does not have any reference chain to the GC, the object can be recycled.

GC Root can be explained by an analog tree



The GC root node exists on the stack and points to the Teacher reference object. It contains an ArrayList order collection, referenced by the Teacher object. The collection itself also contains references to its elements, and as the depth of the search increases, the tree grows.

The reference source for the GC root node comes from

(1), stack (2), global or static variables (3), CPU registers (4), interop references (.NET objects used in COM / API calls) (5), objects finalization references (objects finalization references)

GC operating mechanism

GC introduces the concept of generation, divided into three generations, G0, G1, G2, G0 object life cycle is shorter, the longer the life cycle is longer (although G2 directly stores large objects, and because G2 is not every time Will scan, so in most cases, the life cycle of objects in G2 is longer than in G0).

The GC runs as shown below



It should be noted that when the CLR wants to perform garbage collection, it will immediately suspend all threads in the execution managed code, and threads that are executing unmanaged code will not hang. So in a multi-threaded environment, there may be some strange and strange problems.

The following figure shows the overall operation of the GC, which consists of five steps:



Garbage collection timing and mode

The CLR will perform GC operations when the following happens.

1. When the budget of the generation of the GC has reached the threshold and cannot allocate space for the new object, for example, the 0th generation of the GC is full;

2, explicitly call System.GC.Collect () (display calls should be cautious, because manual calls may conflict with the auto-executing GC, resulting in unpredictable problems);

3, other special circumstances, such as insufficient operating system memory, CLR uninstall AppDomain, CLR off, and even some extreme cases of system parameter settings changes may also lead to GC recycling.

There are mainly GC modes.

  • WorkStation GC
  • Server GC
  • Concurrent GC
  • Non-Concurrent GC
  • Background GC

, the description of the GC mode is more detailed.

GC optimization processing in .NET Core 3.0

.NET Core 3.0 defaults to better support Docker resource limits, and the official team is working hard to make .NET Core a true container runtime, making it container-aware and efficient in low-memory environments.

GC heap limit

.NET Core reduces the memory used by CoreCLR by default, such as the G0 generation memory allocation budget, to better match the modern processor cache size and cache hierarchy.

In the new strategy of creating a number of GC heaps, the GC retains a memory segment, each of which is a minimum of 16M, and runs well on low-memory-limited machines. When running on a multi-core CPU machine, the system does not set the CPU’s core limit. For example, if you set a 160 MB memory limit on a 48-core machine, you don’t need to create 48 GC heaps. This means that if you set a 160 MB limit, only 10 GC heaps will be created. If the CPU limit is not set, the application can take advantage of all the cores on the computer.

With such a new strategy, you can eliminate the workload of workstation GCs that enable .NET Core applications in Docker environments.

Support for Docker memory limits

Docker resource limits are built on top of cgroups, which are kernel features of Linux. From a runtime perspective, we need to locate the cgroup primitive.

.NET Core 3.0 memory usage rules when setting cgroup limits:

  • Default GC heap size: The maximum 20MBor maximum cgroup memory limit on the container75%
  • The minimum reserved segment size for each GC heap 16MB, which will reduce the number of heaps created on computers with large kernel and small memory limits

To support the container scenario, two HardLimit configurations have been added:

  • GCHeapHardLimit – specifies the hard limit of the GC heap
  • GCHeapHardLimitPercent – specifies the percentage of physical memory that is allowed to be used by this process

If both are specified, GCHeapHardLimit is checked first, and GCHeapHardLimitPercent is checked only if GCHeapHardLimit is not specified.

If neither is specified, but the process is running in a memory-restricted container, the default is to use the following settings:

Max (20mb, 75% of the container memory limit)

If a hardlimit configuration is specified and the program is used in a memory-constrained container, the GC heap usage will not exceed the hardlimit limit, but the total memory is still limited by the container’s memory. So when we count memory consumption, we derive data based on container memory limits.


The process runs in a container with a 200MB limit set, and the user also configures GCHeapHardLimit to 100MB.

If 50MB of the 100MB limit in the GC limit is used for GC and the remaining 100MB in the container limit is used for other purposes, the memory consumption is (50+100)/200=75%.

The GC will perform resource reclamation and release more aggressively, as the closer the GC heap is to the GCHeapHardLimitlimit, the more it can achieve the goal of providing more available memory, and the more the application can continue to operate safely. If the algorithm calculates that the GC efficiency is low at this time, then the continuous execution of the fully blocked GC will be avoided.

Even if the GC heap is fully compressed, the GC will still throw one OutOfMemoryException异常出来because the allocated heap size exceeds the GCHeapHardLimitlimit.

Thus, .NET Core 3.0 is designed to run stably in resource-constrained containers.

Support for Docker CPU restrictions

In the case of CPU limits, the value set on Docker will be rounded up to the next integer value. This value is the maximum number of valid CPU cores used by CoreCLR.

By default, the ASP.NET Core application enables the server GC (it does not apply to console applications) because it enables high throughput and reduces contention across cores. When the process is limited to a single processor, the runtime automatically switches to the workstation GC. Even if you explicitly specify to use the server GC, the workstation GC will always be used in a single core environment.

By calculating CPU busy time and setting CPU limits, we avoided various derivational competitions for thread pools:

  • Try to allocate more threads to increase CPU busy time
  • Try to allocate fewer threads because adding more threads will not increase throughput

Reference materials:




Orignal link: