In this article we will review traditional Net Threading and Asynchronous Programming Models:
Traditional Threading Basics
Traditional APM patterns
Briefing about EPM pattern
APM vs EAP
Explain new TPL (Task Parallel Library)
Describe new Async programming model for I/O bound operations (Async & Await)
Describe other Parallelization techniques like Parallel class and PLinq
Parallelization (and its precursor: multithreading) is a key concept in today software development. The current proliferation of multi-core machines (some of them with more than one CPU) forces the parallel programming in order get proper capabilities from hardware. Before .Net Framework 4.0 arrival, the only solution for directly coding multithreading applications was related with Manual Threads’ Creation, the use of .NET Thread Pool or even by using the Traditional APM (Asynchronous programming model) and Event-Based Asynchronous Pattern (EAP).
Currently .NET Framework and Windows Runtime provides high level constructions that allow modeling these scenarios in an easy and simple way; even more, these new components provide clear solutions for known multithreading and asynchronous development issues like the exception handling and returning values from async executions.
Before reviewing these solutions let’s take a brief look to traditional threading and APM concepts (which are also available today and in fact are commonly used behind the scenes) and some internal key aspects that also involved in multithreading scenarios:
Traditional Threading Basics:
Threads are the basis on high performance applications, they enable SO to be responsible in meantime applications run long-time executions, they are in essence a pure form of CPU virtualization which inevitability means memory consumption and runtime execution performance (time overhead). Net Framework provides the ability to create and run your own threads by using a set of classes like Thread, ThreadStart, ThreadStartDelegate, etc…these are some of the basic classes that we usually managed for modeling simple multithreading scenarios, ThreadStartDelegate and ParametrizedThreadStartDelegate classes can be used for wrapping de method that you want to run in a specific thread depending on the potential method parameters.
In the same way when using multiple threads you can make use of Join() method over each thread for forcing the execution to wait until the specific thread to finish (this is similar to the Wait() method over Task classes). Thread can also be stopped by invoking the Abort() method (today it´s advisable to use a shared variable between the target and caller thread than perform the thread stopping or even use a more elegant low level solution by using CancellationTokenSource and CancellationToken which is also supported by TPL), if you use Abort() then you have to be careful and use some kind of critical-region for avoiding left the AppDomain in inconsistent state (for instance you can use BeginCriticalRegion() and EndCriticalRegion() static methods for avoiding Abort() to be executed when the thread is inside this defined region.
One of the most complex topics that has to do with multithreading is data access synchronization more commonly known as Thread Synchronization, that is to say synchronize and throttle the access to objects for those threads that need to interrogate them. The main objective of synchronization locks is to allow you to synchronize access to objects that occurs at the same time — this is an important point to be aware of because thread synchronization is all about timing, if the same resource is accessed by two different threads but this access can not be simultaneous then Thread Synchronization is not needed at all– in this line .NET Framework provides an ample spectrum of classes that can be used to cover all the scenarios.
One of these classes that it´s perfect for some value types (like int for counters) is the Interlock class which implements the Increment / Decrement methods for isolating atomic increasing / decreasing operations. As we have commented this class is valid only for a small set of classes, if you want to lock access to custom .NET classes and other reference types you can use the lock class for performing basic locking operations. Lock class internally uses a more specialized class Monitor which implement methods that can help to prevent synchronization problems like deadlocks (that occurs when thread-locks blocks threads each other). Monitor implements the method TryEnter() which is useful for preventing dead locks.
Thread Synchronization across AppDomains and Processes
From a general perspective is advisable to reduce as much as possible the code that needs to be locked. All the previously mentioned classes help with synchronization locks inside the scope of current application-domain and process but what if we need to cross these boundaries?
For such cases we need to implement another synchronizing mechanism that provides the needed cross AppDomain and even Process coverage by using Synchronization with Kernel Objects like Mutex, Semaphore and Event. .NET Framework exposes these classes as merely wrappers of the corresponding Operating System ones providing a great deal of flexibility and control for managing synchronization scenarios…so far no issues, but….
…but the use of these classes comes with an specific price in terms of overhead these are, if fact, heavyweight objects –It´s a matter of fact that the use of Mutex is about 30 times slower than the use of Monitor class–. Nevertheless, even with this overhead these classes allow to perform synchronization operations that are impossible with Monitor one. Mutex class, for instance, allows lock data across AppDomains and Process boundaries, Semaphore creates a kernel object that throttle the access to an specific resource, properly speaking provide a number of valid slots and when these are filled the rest of requests have to wait until one is freed.
So threads as virtual logic CPU´s provide wonderful features but as with any virtualization mechanism threads means memory consumption and time overhead, these ones can negatively affect to your application performance. In oder to get a proper view about the impact of creating new threads let´s review in deep the actors involved in thread creation:
Thread Kernel Object: Currently every CLR logic thread maps to SO physical one, so this data structure contains the main thread´s properties and it´s allocated by the Operating System every time a new thread is created. In particular this structure contains the thread´s context which is a memory block that contains the CPU´s registers; this memory block is about 700bytes for x86 systems or 1240bytes for x64 systems. User-mode Stack: For this stack window reserves about 1MB and is where all local variables and method arguments will be allocated. Kernel-Mode Stack: This space will be populated when the application passes parameters to a kernel-mode function in the operating system, and all local variables and parameters from User-mode stack will be copied to the kernel-mode stack for security reasons. Another item to be populated will be the Thread Environment Block (TEB), which contains the head of thread´s exception handling chain and thread’s thread-local storage data in addition to other structures. Finally when a new thread is created in a process it implies the DLL Thread Attach /Detach notifications, which means that every DLL loaded in this process will invoke their DLLMain() method passing the corresponding DLL_THREAD_ATTACH/ DLL_THREAD_DETACH flag.
In the past, when common applications implied to be three or five Dlls in memory this didn’t supposed issue but for current applications (in which we can have near of 250 dlls loaded) the situation is very different and creating and destroying threads can affect seriously your application performance. In this line it worth to mention that DLLs created by C# compiled did not implement the DLLMain() method, this way the performance is indirectly improved.
All these structures are involved in thread creation and affect to application performance but they are not the only ones….
Another issue related with the use of threads comes when we consider Context Switching operations. Context switching is the mechanic used by Windows for sharing the CPUs with all the threads that are waiting for doing their operations. In any moment, windows assign a specific slice-time for every thread, after that windows context switching to the next thread; every switch means pure performance hit in terms of the following task: first all the values in the CPU´s registers from the current thread’s context are saved into the thread´s kernel object, then windows is in charge to select the next thread be scheduled, if occasionally this thread belongs to the scope of another process then windows will have to switch the virtual address spece also, after that all the values in the selected thread´s context will be load into the CPU´s registers.
It´s known that Windows execute context switching about every 30ms, so it´s quite obvious the inherent performance impact as a consequence of it.
……the first conclusion is quite obvious: ue threads with reponsability and only when necessary. Try to preserve the number of thread as minimum possible because they implies a lot of memory consumption and they requires time for creating and destroying them, in most cases is preferable to use CLR Thread Pool or even some of the high level constructions like Task, APM, etc… for allowing CLR to mange thread´s creation. It also worth to mention that a computer with multiple CPUs will be able to run multiple thread simultaneously providing a good scalability feature to your system, so a proper balance has to be achieved between the minimum number of threads and the scalability and responsiveness of our applications. Fortunately Microsoft provides several mechanism for achieving that goal.
APM stands for Asynchronous Programming Model and is the traditional pattern for implementing asynchronous operations at .NET applications; asynchronous operations have all sense when we are going to perform I/O-bound operations like access to some file or device for reading/writing, socket communications, Web Request or even database requests; this mean allows part of our code to be executed at different thread that the caller one, which can be consider as the basis for creating high-performance scalable applications.
.NET Classes Implemeting APM
This pattern is supported by many NET types related with I/O-bound operations like System.Net.DNS, System.Net.Socket, System.Net.WebRequest, System.Data.SqlClient.SqlCommand and all System.IO.Stream derived classes like FileStream or NetworkStream by providing Begin[MethodName] and End[MethodName] versions of those synchronous methods that represent an I/O-bound operation (at this respect there exist some exceptions in where APM also can be implemented to specific compute-bound operation like BeginRead(), BeginWrite() at MemoryStream class for instance, which doesn’t perform any I/O-bound operation in fact). In this same line, it worth to mention that all delegates implement a BeginInvoke method to be used under APM scenarios and all traditinal Web Service proxies (ASMX) also implement this pattern of allowing asynchronous calls to web services methods.
IAsyncResult & AsyncCallback
This splitting of synchronous method into asynchronous ones for I/O-bound operations introduces new important entities: IAsyncResult & AsyncCallback. Let’s compare the firm of FileStream.Read() method with its asynchronous version
As can be seen from both firms, the three first parameters are the same, in addition asynchronous version include two additional ones: userCallback is of the AsyncCallback delegate type that usually wrappers the method or lambda expression to be invoked when the parallel execution completes, this parameter is only used at Callback pattern; and stateObject which is a reference to any object you want to forward on callback method that will be accessible through IAsyncResult interface.
|public delegate void AsyncCallback (IAsyncResult param)|
IAsycnResult will be the output of BeginRead(), and this interface will be the input parameter of EndRead() method when trying to complete the execution. In addition, it will be also useful at polling pattern for asking to the background thread about execution status by using IAsycnResult.IsCompleted property. When calling to BeginRead() it constructs an instance that identifies I/O-bound operation and queues up into Windows device driver, this in turn returns a reference to the IAsyncResult.
Traditional Redezvous Models
From an operational high level perspective we had three different models for implementing traditional APM in our application, these are known as Rendezvous Models: Wait-Until-Done, Polling and Callback. In WuD pattern we make the asynchronous call and then the caller thread perform the next block of operations, after this work is done you try to end the call and the main thread will be blocked until done; The Polling pattern is very similar but in this case the main thread will execute code until the background thread completes the async operation, this is achieved by continue querying the using IAsycnResult.IsCompleted property. In the two previous patterns, when calling to the Begin-Operation methods theres is no need to provide the AsyncCallback and State object, these parameters belongs to the scope of the third rendezvous pattern, the Callback one. In this pattern we will specify the callback method that will be invoked when the background execution is completed, this is provided as the userCallback parameter at BeginRead(). In addition we need to provide any state that we need to complete the call; usually this parameter is the proper object that contains the asynchronous methods.
Today this pattern has been replaced for async-await one which provides specific solutions to well know issues of APM (like exception management) and from a more near perspective by well know wrappers of APM like Task, we will describe this pattern later.
Some APM´s Drawbacks
Despite of the APM principles are easy to understand, the effective use of traditional APM is not straight forward and presents some shortcomings, this is the main reason why a lot of developers avoid using this pattern. Among others we can identify the following ones:
- Exception Management: You have to manage the exceptions raises at Begin[MethodName] at End[MethodName] part, but this presents problems because in some cases when you have several concurrent asynchronous executions not all exception are caught (in very specific scenarios)
- Splitting you code into callbacks
- We have to avoid use local variables and arguments, because they are allocated on thread’s stack which is not accessible by other threads.
- Some common C# constructs are impossible to implement when dealing with APM like try-catch, do-while, foreach, using, etc…
APM is awesome tool for making your applications scalable and when coupled with .NET Thread Pool asynchronous operations allow getting full advantage of all CPUs in the machine, but from an internal point of view there are some drawbacks that worth to keep in mind when using APM
- Memory Consumption, every time we call to Begin[Method], it constructs an instance of a type implementing IAsyncResult interface and this will be repeated for any asynchronous operation we need to perform which means more overhead and more objects in the heap that causes more garbage collector to occur
- Cancellation, Concurrent coordination, timeouts etc.., are really hard to implement by using traditional APM
The Event-Based Asynchronous Pattern EAP
This pattern was introduced by Microsoft only for making the things easier to Windows Forms developers and has been a controversial focus of discussion because many people disagreed including EAP as a part of Net Framework. People from Microsoft thought that IAsyncResult APM was too difficult for Windows Forms developers, so they decide to include EAP.
The reality is that EAP forces some I/O classes to provide coverage for two different asynchronous models that perform the same functionality so it adds more complication that it solves. One of the “big” advantages of this pattern is its support for Visual Studio designer surfaces, generating needed code when drag and drop some controls like Web Client. Behind the scenes, EAP uses SynchronizationContext class so it has the ability to understand the threading model in which the current application is executed on, this way is able to ensure that the event handler method is executed over the GUI thread
In EAP you commonly invoke the EAP asynchronous version of specific I/O method by using [MethodName]Async. Before this invoking you should define and register the EventHadler to the Completed-Event of the class that exposes the I/O method, that usually have the signature [MethodName]Completed , which is executed on the GUI thread guarantying that handler has fully access to UI controls for passing output information.
APM vs EAP
Frequently EAP classes are internally implemented by using APM ones, so APM is closer to the metal than EAP. This usually means that EAP objects perform slower that their APM equivalents and requires more memory, the main benefit from EAP is its support for Visual Studio Designer surface to create a design-time implementation for invoking asynchronous operations. Another facility commented earlier is that EAP model guarantees that completed handler is executed by the GUI thread which allows to pass output data to UI controls.
On the other hand EAP must allocate all the internal classes for process reporting and completion events, and other related classes commonly used for identify different operations, etc.. this means more memory consumption which can be a problem for building a high-performance server application (obviously the situation is different for a local WinForm /WPF application, and this performance penalty will be insignificant). On the other hand events are cumulative so you need to unregister/register for assigning new method handler if we want next asynchronous operation invoke a different method.
Error handling is another weak point of EAP, exceptions are not thrown, so you have to directly query the AsyncCompletedEventArgs.Exception property so see if you have any exception. Even more, if there are exceptions there you need to determine first the type of exception for implementing exception handling.
CLR Thread Pool
As we have seen creating and destroying threads are an expensive operation in terms of time and memory consumption, in addition having a lot of threads in memory increase the context switching with also hurts the performance, so in order to resolve this situation CLR implement its own thread pool that can be considered as a smart thread´s mechanism (enhancements are implemented with each new version) that manages the creation, reusing and destroying threads on demand basis on CPU utilization, memory availability, etc…CLR Thread pool can be considered as a set of threads that your application can make use, instead of creating a new thread you can make use of an existing one from thread pool which is faster, after using it the thread is not destroyed, instead is freed and returned to the pool for allowing another operation to use it. There is an unique default thread pool per CLR version and it´s shared by all AppDomains under this CLR.
CLR Thread Pool balances the equation between having as few thread as possible for avoiding waste resources and time, and having more thread to get the full hardware coverage from multiprocessors and multi-core processors. From an internal point of view Thread Pool manages a queue of execution requests, when we need to execute an asynchronous operation, we can make use of QueueUserWorkItem() method for enqueueing the corresponding work item, after that, Thread Pool will select an specific thread for performing the operation, if there is not thread at thread pool then a new thread will be created, if many concurrent requests arrive to the thread pool and there is not enough thread for attending all of them then new threads will be created.
From an operation point of view Thread pool manages two kinds of threads: Worker and I/O Threads, the first type is commonly used for performing compute-bound operations and second ones are used for I/O-bound operations like access a file for reading/writing, accessing logical drive, network communications, web requests, database request, etc…
Task Parallel Library
CLR Thread Pool is a valid low-level mechanism for implementing parallelism into your applications, using ThreadPool´s QueueUserWorkItem() method to start an asynchronous compute-bound operation is quite straightforward but it has two main drawbacks: the first one is that there is not buil-in way to know when the operation has completed and the second is that there is not way to get the return value back when the execution ends. In addition, cancelling a thread from thread pool is hard and requires from developer having a low-level knowledge about how to use CancellationTokenSources, etc… To solve all these limitations, Microsoft introduced the concept of Task and from a more broad perspective the Task Parallel Library (TPL).
So currently is possible (and advisable) to use this high-level library for implementing parallelism and concurrency in your .Net applications in a very easy way avoiding the traditional complexities and issues derived from managing Threads and APM directly (Thread Synchronizations, etc…).
This library enhances the traditional parallel programming by providing a new runtime, new library and new diagnostic tools that helps to write efficient and scalable parallel code in a very simple way, allowing developers to focus in writing the code your application need and letting the system to get the most efficiently use of the all available processors. TPL is comprised by a set of classes that lives at the System.Threading and System.Threading.Tasks namespaces. This high level library solves the issues related with cancellation tasks, exception handling, state-management, etc…Behind the scenes the TaskScheduler, which is the responsible of executing tasks, is using threads from the thread pool for doing the work.
As we commented earlier TPL is based in the Task concept that represents an asynchronous operation similar to the Thread concept of even Workitem at Thread Pool but at higher level. It´s said that Task provide two new advantages when dealing with parallel code: provides a more efficient and scalable use of resources and It provides more control than the thread or workitem at thread pool (as we commented earlier, TPL provides cancellation support, structured and robust exception handling, state management, etc…).
Tasks can be explicitly/implicitly created. To implicitly create a Task you can use a construct of type Parallel.Invoke which allows running a number of arbitrary statements concurrently, the only thing we need here is the Action delegate or equivalent lambda expression argument for each “workitem”. When creating a Task explicitly we need to wonder if the Task is going to return any value, if there is no value to return then we will use System.Threading.Tasks.Task class otherwise we will use System.Threading.Tasks.Task<TResult> which inherits from the previous one. When creating a new task explicitly we will also use an Action delegate that can be expressed as named delegate, anonymous method or even a lambda expression which in turn can contain a reference to a named method. (Task taskItem = new Task( () => Console.WriteLine(“This is a proof”));)
The Task object is more easier to use than their low-level counterparts Thread or Workitem and provides access to properties and method that simplifies complex operations; we can access to status property at any given time for asking about the current TaskStatus, this way we can know if the Tasks is running, cancelled or event if it has thrown any exception.
In addition we can use the Run() method for creating and starting new Taks, this method relies on the default Task Scheduler associated to the current thread but this method provides more control about the different options when creating a task.
Finally we can also make use of TaskFactory for creating Tasks by using the StartNew () method to create and start a task in a single operation. This mode is commonly used when there is no need to separate the Scheduling and the creation of a tasks or when it needed to pass the state into the task by using its AsyncState. The potential result of execution will be available through the Task<Result>.Result property.
The introduction of Task concept provides a great deal of flexibility and features for modeling asynchronous computed-bound operations which make things easier to implement to developers, but some memory considerations have to be taken into consideration: every Task instance offers a collection of fields that define complete task state, some of them are : task Id property, task parent reference, TaskScheduler reference, State property, reference to the callback method, reference to the object to be passed to the callback method, Execution context reference, and some other references related supplementary task state which includes CancellationToken, ContinueWithTask, etc….so despite of the richness of features and simple use of TPL there still be special situations in which it will be preferable to continue use ThreadPool.QueueUserWorkItem for doing a more efficiently implementation in terms of memory consumption.
Although TPL was initially introduced to provide a flexible model for executing asynchronous computed-bound operations, Task and Task<Result> and TaskFactory also offers overloads of the FromAsync() method that allows us model I/O-bound operations that traditionally were managed by APM, in fact APM Begin/End method can be wrappered in a single Task or Task<Result> instance (Task class implement the previously mentioned IAsyncResult interface, this way The AsycnState property is the IAsyncResult AsycnState property)
Async / Await Pattern:
Async /Await pattern is a new feature from .NET Framework 4.5 which allows implementing asynchronous I/O-bound operations (working with files, network connections, database connections, logical drives, etc…) in a simple, elegant and intuitive way. A/A provides syntax for writing asynchronous code in a similar way that the synchronous one. This model is easier to use and it´s more intuitive for the developer than the traditional APM pattern. Most of current I/O Framework 4.5 classes provides support for Async/Await in a similar way that the traditional BeginXXX/EndXXX APM pattern, these method can be recognized by the suffix Async attached to the common method name and its return type as Task or Task<Result>, for instance asynchronous versions of Read(), CopyTo(), WriteTo() method at Stream class are ReadAsync(), CopyToAsync() and WriteToAsync().
By implementing this pattern, your caller thread is not blocked until the I/O-bound operation completes, instead of that, the call return a Task or Task<Result> object that indentifies the output of asynchronous operation , by chaining a continuation task you can complete the execution after the I/O-bound operation ends, this way the caller thread is not sitting waiting for the I/O-bound execution ends and it´s returned to the thread pool being able to execute other tasks. Obviously after I/O-bound execution ends the continuation Task will be processed by another thread from thread pool.
When creating new asynchronous methods we use the async keyword for telling the compiler that our code will contain asynchronous operations, the compiler then will transform our scoped code in a state machine. So marking our methods with async allows our code to be split into multiple pieces, each on these pieces has to be bounded with await keyword. When marking one point of code with await keyword then the compiler will generate code to see when your asynchronous code is completed, if it’s completed then the execution continue running synchronously, if not the state machine will hook a continuation tasks that will be executed when the operation ends. The await keyword allows to write code that seems synchronous but that it behaves asynchronous.