Multithreading in COM
by Binh Ly
Consider a simple EXE COM server, FooServer. Let's assume FooServer exposes
an object Foo that has a method Bar:
IFoo = interface
Let's also assume that FooServer has no clue what multithreading is and is
therefore single-threaded just like any normal application. Consider 3 clients
(C1, C2, and C3) each of which creates an instance of Foo from FooServer. All 3
clients then call IFoo.Bar simultaneously (assume C1 got through
first, followed by C2, then C3). Since FooServer is single-threaded, it will
process C1's call to completion, and then process C2's call to completion, and
then process C3's call. Why? Single-threading enforces a single path of
execution meaning that FooServer can only process calls sequentially one after
Assuming that IFoo.Bar takes a minute to execute, C3 will take at least 3
minutes to execute IFoo.Bar. This is because it has to wait until C1 and C2 are
completed, each of which takes a minute. Wouldn't it better if the server could
be a bit fair and somehow divide its time equally for the 3 clients since all 3
simultaneously called into the server anyway? Or better yet, if FooServer was
run on a kick-ass 10 processor machine, wouldn't it be cool if all 3 clients
were processed simultaneously on 3 separate processors?
A simple way to improve FooServer is to create a thread for each instance of
Foo. Using our example, C1 would get a Foo on server thread 1 (T1), C2 would get
a Foo on server thread 2 (T2), and C3 would get a Foo on server thread 3 (T3).
This way, when all 3 clients simultaneously call IFoo.Bar, T1, T2, and T3 would
all kick in to do the work simultaneously. And if you had a multiprocessor
machine running FooServer, T1, T2, and T3 can each execute on its own processor!
Now that we have multithreaded FooServer, let's try to simplify and formalize
a few things. Consider the Foo (Foo1) instance on T1. Since Foo1
"belongs" to T1, it would make sense to simplify our lives a bit by
making T1 be "in charge" of Foo1, i.e. an outside call into any of
Foo1's methods must execute on T1, and only on T1. Also, our idea of FooServer
creating a separate thread per instance of Foo might not always be a good idea.
Why? If a hundred clients each created Foo, that would instantly create a
hundred threads in FooServer. For a thousand clients, that's a thousand threads;
for a hundred thousand clients, that's a hundred thousand threads, and so forth.
You get the idea! What would be better is to instead allow each thread to
possibly "hold" more than 1 instance of Foo resulting in a form of
thread reuse. What I mean by this is let's say we have 9 clients connecting to
FooServer, each of which creates an instance of Foo. We can still enforce the 9
objects into threads T1, T2, and T3 in a more ingenuous
Figure: Distributing 9 Foo objects
into 3 threads
As we can see, each thread holds 3 Foo instances and 3 threads can
hold 9 Foo's thus, satisfy
clients. This way, we don't unnecessarily waste resources had we used our one-thread-per-Foo-instance rule. However, because we simplified things a little
bit, instances that "belong" to single thread are all serviced on that
thread, and only on that thread. Foo1, Foo2, and Foo3 are all serviced on T1
meaning clients that simultaneously call into Foo1, Foo2, and Foo3 are
sequentially serviced (one after the other) on T1. The same behavior applies for
T2 with Foo4, Foo5, and Foo6, and also for T3 with Foo7, Foo8, and Foo9.
Let's take a closer look again at what we've just seen:
Figure: 3 single threaded
apartments in a server
Each box in the above diagram is what COM calls the
Single Threaded Apartment. Surprisingly simple isn't it?
Let's break the term down to 2 parts: "apartment" and "single
threaded". "Apartment" is used to describe a place for a group of
objects "living" together and sharing the same threading behavior and
requirements. An apartment is an abstraction of "a container of objects and
threads" similar to how a thread is an abstraction of "a path of
"Single threaded" is used to denote that the threading behavior for
this particular apartment is such that there is 1 thread, and only 1
thread, that executes in the apartment, ever. Got it? If not, read the entire
paragraph three more times.
Here's a few important characteristics of the
- There is 1 and only 1 thread in an STA. All objects that reside in the STA
are serviced on this one thread.
- A server can implement multithreading by creating
multiple STAs. Since an
STA contains 1 thread, multiple STAs result in multiple threads. Furthermore,
each STA can "house" more than 1 object. This prevents the server
from excessively creating a lot of STAs!
Because an STA contains only 1 thread, it is said to be serialized by
default. What this means is that if 2 clients simultaneously call into each of
its objects, residing in the same STA, the STA architecture will ensure
that the 2 calls are serialized (processed sequentially one after the
Figure: Default STA serialization
This serialization behavior in the STA is implemented by COM using the
standard Win32 window messaging architecture. What happens is COM interjects a
hidden window between the client and the object in the STA. Whenever the client calls into an
object in the STA, COM intercepts the call, bundles it into a windows message,
and then posts the message into the message queue of the hidden window. Once the
hidden window gets to the message in its queue, it unbundles the message and
then makes the actual call into the STA object. This is all made possible
because the hidden window runs on the one thread that lives in the STA. In order
for the hidden window to receive and handle calls/messages, it must continuously
check and process its message queue. This means that the STA thread is
required to run a message pump (a process which checks and handles messages
from a message queue) in order to live.
Figure: COM STA architecture
If the STA thread does not pump messages properly, the hidden window won't
see the messages and thus, would not process them. In this case, a client making
a call into an STA object will appear to be hung/unresponsive simply because the
call is not coming through into the server!
Now that you've seen what an STA is, let's look at how FooServer would
implement STAs for its Foo object. Recall our previous discussion on the class factory.
To refresh your memory, the class factory is the standard "gatekeeper"
used to create instances of objects from your server, i.e. when a client wants
to create an object, it first goes to the class factory, and the class factory
creates the actual object. Because of this, it
makes sense to somehow associate the creation of threads with the class factory
so that we can control how to create threads when the class factory creates the
instance, the following pseudocode provides a simple implementation of a
thread-per-object allocation for Foo (also recall that the
IClassFactory.CreateInstance method is where object instances are created):
CreateInstance called everytime client asks to create a new Foo instance
// create new thread
NewThread = CreateANewThread;
// create new Foo on new thread
NewFooInstance = CreateAnInstanceOfFoo on NewThread;
// return newly created Foo to client
Result = NewFooInstance;
What's happening here is that everytime a client requests to create a new Foo
instance, Foo's class factory will create a new thread and then create an
instance of Foo on that new thread. But wait, where's the STA here? All we've
created is a new thread and a new Foo instance, but not the apartment.
The answer's actually simple. It's the thread that decides what type of
apartment it wants! It does this by calling the CoInitializeEx (or CoInitialize
in the old days) COM API. Each thread in your application that wants to work with COM must
request to initialize an apartment so that COM knows how to work with that
thread! This is a very important rule because an apartment is characterized
by how tough it is when it comes to handling multiple threads. In other words,
an apartment defines what and what cannot be done with the objects and threads
contained in it. The process of a thread initializing an apartment means that
that thread wants to work only within the boundaries of its apartment's rules.
An apartment is a real, living entity
in COM! For each apartment, COM needs to allocate resources in order for the
apartment function properly. CoInitializeEx tells COM to do whatever needs to be
done to setup an apartment for the calling thread. CoInitializeEx must be
called prior to any code in a thread that interacts with COM. This usually means
that a thread must call CoInitializeEx as its first statement. For our
FooClassFactory, the new thread would initialize as follows if it wants the STA:
Thread routine created from the line NewThread = CreateANewThread
// in FooClassFactory.CreateInstance
// initialize apartment
... do work here ...
// leave apartment
The SingleThreaded parameter indicates that the new thread is interested in
initializing itself into an STA. The CoUninitialize call towards the end is
important and indicates that the thread is terminating and tells COM to do any
cleanup of resources that were allocated when the apartment was initialized.
In order for our new thread to be a bonafide STA service thread, we also need
a message pump so a more accurate pseudocode would look like this:
// message pump
while MoreMessagesInQueue do
Important note: The message pump is only necessary for an STA that contains
objects to be used from other apartments, i.e. if the STA serves objects to
clients. For STAs that simply create an object, call a method, and release the
object, there is no need for a message pump.
Another important thing to remember when working with STAs is proper
synchronization. A single STA is itself synchronized on its sole thread. An object within
the STA does not need to do any extra synchronization on its instance data because the STA architecture does the synchronization for you for free. However,
multiple STAs (meaning multiple threads) can easily step on each other when
accessing global data. Because of this, objects from different STAs must properly
synchronize among themselves when accessing global data:
Figure: Multiple STAs must
properly synchronize access to shared data
Thus, the correct way to handle global data is to use a locking mechanism
when accessing the data. This pseudocode shows how a simple locking mechanism is
performed using a Win32 critical section:
This ensures that if 2 Foo objects, each on different STAs,
simultaneously execute Bar, the
AccessGlobalData routine will only execute one at a time preventing any
possibilities of data corruption.
The automatic STA serialization has its advantages and disadvantages. You can
develop an object that doesn't need to worry about instance data synchronization
by simply putting it in the STA. However, the STA serialization imposes a few limitations:
- If you have objects that have no danger
(or no need) of instance data corruption across
threads, you don't really need the automatic serialization feature of the
STA. It'd just be unnecessary overhead. In this case, throughput can be much
better without the synchronization.
- A client (on a different thread) talking to an STA object (on another
thread) incurs some overhead. This is because whenever a client makes a call
into the object, COM has to do a thread-switch from the client thread to the STA
thread. What this means is that COM will temporarily "suspend" the
client thread, switch to the STA thread and then make the call, and then
switch back to the client thread. Thread switching is a rather expensive
operation and should be avoided if possible.
For developers that are not impressed with limitations imposed by the STA, a
"new and improved" apartment is in order. This apartment must have the
- No more automatic synchronization. This apartment will have the ability to
accommodate more than 1 thread. If multiple clients call into objects in
this apartment, all calls must proceed immediately and fearlessly!
- No more thread switching. All threads in this apartment can freely call
into any object within this apartment at any time they well damn wish!
Objects in this apartment will have no concept of being "owned" by
1 thread as in the STA.
Not surprisingly, this apartment is what COM calls the
"Multithreaded" is used to indicate that this apartment can have
multiple threads in it. Since any number of threads can live in the MTA, there's
need for only 1 MTA per application. Contrast this with an STA which can only
contain 1 thread which would require multiple STAs for multiple threads. Unlike
the STA, the MTA doesn't designate a particular thread that handles calls into
its objects. How, then, does the MTA know which thread to use to make calls into
MTAs are different from STAs in that there is no hidden window, no messages,
and therefore no message pump required. The MTA's architecture is such that COM
will manage an internal pool of threads for each MTA. When a client makes a call into an
object in the MTA, COM will look into its thread pool, find an available thread
and make the call directly from that thread. This means that an MTA object can receive method calls from arbitrary
threads anytime. COM manages this
thread pool by growing or shrinking the number of threads in the pool as
Figure: COM MTA architecture
Since objects in the MTA receive calls from any thread at any time,
synchronization is harder than the STA case. Instance or per-object data is no
longer secure and any access done to them that can cause corruption will need to
be properly synchronized. The same goes for global data as in the STA case. The
more important thing is that MTA objects must not be dependent on anything that
is thread-relative (or as the gurus like to say, MTA objects must not be
dependent on anything that has thread affinity). An example of a thread-relative
entity is a Win32 window handle; other examples are objects that have some
dependency on thread local storage (TLS).
A new thread that wishes to initialize the MTA can do so by calling
CoInitializeEx as follows:
// initialize apartment
... do work here ...
// leave apartment
Note that we use the MultiThreaded parameter as opposed to SingleThreaded (STA)
to specify that we are interested in the MTA. Since there's only 1 MTA per
process, the first thread that calls CoInitializeEx (MultiThreaded) creates the
MTA whereas any similar succeeding calls from other threads will enter the
existing MTA. Note that you still have to match up calls with CoUninitialize
even though other threads simply enter the MTA because COM keeps track of the
thread count in the MTA.
The automatic synchronization in the STA and the fearless threading in the
MTA are COM's guarantee to the developer. What this means is that if you have a
weak object that cannot simultaneously handle multiple threads or is dependent
on thread-relative information, that object has to go into the STA. By putting
it in the STA, COM guarantees automatic synchronization, no more, no less. On
the other hand, if you have a tough object that's not a afraid of a thread
beating, you can put it in the MTA. This way, COM will guarantee that it will
get a damn good beating if it needs to.
Aside from the COM guarantees, you must also guarantee to abide by COM's
rules. For instance, consider 2 STA threads, T1 and T2. Assume that T1 creates
an instance of Foo and stores it in a global variable T1Foo:
T1Foo : IFoo;
// an STA service thread for T1Foo
T1Foo = CreateAFoo;
Consider the adventurous T2 wanting to play with T1Foo:
// an STA thread
From what we've learned, it is obvious that T1 and T1Foo
live in an STA.
More specifically, T1 is the STA's lone service thread. T2 lives in another STA.
Now look closely at T2. The raw call to T1Foo.Bar will me made from the context
of thread T2, i.e. 1) Bar won't be executed from within the context of thread
T1 and 2) it will bypass T1's message pump. Both of these are clear violations
of the STA model! This is what I mean when I say you also have to abide by COM's
How exactly do you abide by COM's threading rules in this case?
The answer lies in a process called interface marshaling. In
COM, an interface pointer is valid only in the apartment that acquired it, and
nowhere else. In our example, the T1Foo interface pointer is only valid in T1's apartment because that is where it was assigned from.
If you want to use an
interface pointer that's valid in an apartment different from yours (as is the
case with T2's STA using T1Foo), you have to ask COM to "massage" the
interface so that it will be valid in your apartment. This "massage"
process is called interface marshaling. The marshaling process involves taking
an interface pointer from the source apartment (apartment where pointer is
valid), converting it into a stream of bytes, shipping the stream to the target
apartment (apartment that wants to use it), and finally decoding the stream of
bytes back into a live interface pointer.
After the marshaling is performed, COM will set up (behind the scenes)
something called a proxy in the target apartment and a stub in the source
apartment. A proxy is nothing but a small object that exposes the same exact
interface as the original interface pointer does. The proxy and stub work in
tandem as apartment-to-apartment translators in such a way that the client in
the target apartment talks to the proxy, which in turn talks to the stub in the
source apartment, which
finally talks to the object.
Figure: COM Marshaling
|If you also work
with CORBA (or Java RMI), the marshaling terms might be confusing at first. CORBA's stub
is the equivalent of COM's proxy, and CORBA's skeleton is the equivalent
of COM's stub.
COM takes care of how the proxy talks to the stub. In fact if T2 had a proxy
to T1Foo, then the T1Foo.Bar call would go through the proxy, then to the stub,
then to the actual Foo instance in T1. Since the stub "lives" in T1's apartment,
the stub will make the call on T1 and T1's message pump will therefore pick the
call up from the queue and then execute Bar in T1's context. Pretty cool huh?!
In other words, using the proxy and stub mechanism is how we can work with COM
when attempting to use interface pointers across apartments.
Interface marshaling can be accomplished using the CoMarshalInterface and
CoUnmarshalInterface APIs. CoMarshalInterface exports the interface pointer to a
byte stream and CoUnmarshalInterface imports the pointer from the byte stream.
CoMarshalInterface is normally called from the source apartment and then
CoUnmarshalInterface is called from the target apartment. Because the
CoMarshalInterface and CoUnmarshalInterface APIs require that you manually
allocate/deallocate the byte stream, developers normally prefer a more convenient pair of
APIs that automatically take care of the stream allocation/deallocation:
CoMarshalInterThreadInterfaceInStream and CoGetInterfaceAndReleaseStream. Here's
a pseudocode of how these APIs can be used for our example above:
MarshalStream : IStream;
// an STA service thread for T1Foo
T1Foo : IFoo;
T1Foo = CreateFoo;
// export/marshal T1Foo into byte stream
CoMarshalInterThreadInterfaceInStream (T1Foo, MarshalStream);
// an STA thread
T1Foo : IFoo;
// import/unmarshal T1Foo from byte stream
T1Foo = CoGetInterfaceAndReleaseStream (MarshalStream);
Note that the marshaling process has to
occur in sequence: T1 must execute CoMarshalInterThreadInterfaceInStream first
before T2 can meaningfully execute CoGetInterfaceAndReleaseStream!
Threading For In-Process Servers
In-process/DLL servers introduce some factors into the entire COM threading
business. This is because DLLs do not normally proactively create threads on
their own. Being passively mapped into the address space of the client
application, DLLs simply "blend" in with apartments and threads the
client creates. In other words, it is the client that actually creates
the threads and it is the client that makes the CoInitializeEx calls - objects
in the DLL simply go with the flow of what the client wants to do.
Why is this? Simple. DLLs simply blend into the client. Once a DLL is mapped
into the client's address space, it is no different than any other code that is
part of the client. An object in the DLL is no different than a raw object in
the client. In fact, making calls into an object in the DLL is exactly the same
as making calls into any other object in the client. In a sense, the client is
itself both a client and a server even though strictly speaking, the DLL is the
Since DLLs do not proactively take part in creating threads, objects in a DLL
require a different mechanism of initializing themselves into the apartment of
their choice. For instance, a weak object has to somehow tell the client that it
can only live in an STA whereas a kick-ass object can tell the client that it
prefers to live in the MTA. Of course DLL objects don't tell clients, per se, of
their choice. Instead, when a DLL object gets registered,
it indicates its apartment of choice into a ThreadingModel subkey. For example,
if Foo prefers to live in the STA, here's what Foo theoretically looks
like in the registry:
CLSID of Foo
CLSID of Foo\ServerLocation = "FooServer.dll"
CLSID of Foo\ThreadingModel = "Apartment" // <== Foo prefers
to live in the STA
What exactly does this mean? Whenever a client thread wants to created Foo,
COM first looks at the ThreadingModel subkey. If it finds that the client thread
belongs to an apartment that Foo prefers (in this case STA), then COM will
create Foo directly into the client thread's apartment.
Figure: Direct apartment
activation on compatible threading models for inproc objects. Foo1 is activated
directly into STA 1 and Foo2 is activated directly in STA 2.
If, on the other hand, the client thread belongs in a different apartment
(for instance if the client thread called CoInitializeEx (MultiThreaded)) than
the object's preference, COM will silently create an apartment that matches the
object and then marshal an interface pointer from that apartment into the client
apartment that requested to create the object. Why does COM do this?
Imagine a client thread T1 that has entered the MTA as follows:
FooVar : IFoo;
FooVar = CreateObject ("FooServer.Foo"); // assume Foo's
... do some things ...
Imagine a second client thread T2 that enters the MTA at a later point:
// call Bar on FooVar variable initialized from T1
Unlike our previous example, T2's direct access to FooVar is perfectly legal.
Why? Because an interface pointer is valid from within the apartment where it
acquired. In this case, both T1 and T2 are in the same apartment - the one and
only MTA. What you might not have noticed is that Foo declared a ThreadingModel="Apartment".
So if COM creates Foo directly into the MTA that containes both T1 and T2, T1
and T2 (and any other thread in the MTA) would be free to crush (make calls
into) Foo by virtue of
the MTA. But we cannot allow that since Foo says it prefers the STA, i.e.
"Do not allow multiple threads to simultaneously call me because I cannot handle it!"
This is exactly why COM will do some extra work to create Foo in an STA and then
hand back a marshaled pointer into T1 and T2's MTA. In simple terms, if the
client and the server have different threading requirements, COM will do some
extra work to ensure that both the client's and the server's wishes are granted
|Note though that
COM will not create a separate STA for each Foo that is incompatible with
the calling client thread. Instead COM will use a single STA to house all
of the incompatible Foo instances. COM will use the first or
"primary" STA that the client creates for this purpose. If the
client hasn't created any STAs yet, COM will automatically create one
If Foo prefers to live in the MTA, it would register its ThreadingModel value
as "Free". What this means is that if a client thread is from the MTA,
COM will happily create Foo directly into the client's MTA. However, if the
client is STA, COM will also be a smart-ass and silently create an MTA on behalf
of Foo and hand back a marshaled interface pointer from the MTA into the client
|Again, COM will use
the one client MTA to house all incompatible Foo instances. If the client
hasn't created the MTA yet, COM will automatically create one when needed.
Foo can also be adventurous and be
indifferent to the client's apartment type. COM
also supports the "Both" ThreadingModel value meaning that Foo doesn't
care if the client's thread is in the STA or the MTA. If the client is in the STA,
COM will create Foo into the STA; if the client is in the MTA, COM will create
Foo into the MTA. In a sense, "Both" really means "Either"
STA or MTA.
As we've discussed before, synchronization for "Apartment" involves
proper protection of global data and synchronization for "Free" or
"Both" involves proper protection of both instance-specific and global
data. Depending on the synchronization toughness level of your object, that's
one factor to determine which ThreadingModel value to use.
Costs of Interface Marshaling
An important deciding factor for the ThreadingModel value is by
considering the overhead involved in marshaling. Remember, marshaling involves a
proxy-stub connection and more importantly, an expensive thread-switch per
method call. How is this important?
Consider an object marked as ThreadingModel="Free". That's like
saying "I'm tough, I can handle anything!" Oh yeah?! If your clients
are mostly STA-based, a client that creates your object will always receive a
proxy because of the magic that COM does behind-the-scenes. A proxy kind of
defeats your tough object doesn't it? It would probably be better if your object
declared itself as ThreadingModel="Apartment". That way, you get no
magic from COM, and hence no stinking proxy!
Consider another scenario. You have an object marked as ThreadingModel="Apartment".
Let's say you have clients that are STA-based as well as clients that are MTA-based
- say evenly distributed (50/50). Your STA clients would be very happy, but your
MTA clients would be very sad due to the proxy syndrome. If you had declared
your object as ThreadingModel="Both", it would probably be the best of
both worlds for both STA clients and MTA clients.
But remember, marshaling overhead is only part of the picture. For instance,
it makes no sense to declare an object as ThreadingModel="Free" if it
depends on anything that is thread-relative. In this case, ThreadingModel="Apartment"
would be more appropriate!
|You can also mix
objects of different threading models within a single server. The
usefulness of this is relative to how your objects are being used.
However, its good to point out that mixing objects with different
threading models is a possibility and can sometimes be beneficial in terms
Where Are We?
Whew! Multithreading is both a hard and easy topic in COM. Hard because it
takes time to see the big picture and easy because once you see the big picture,
everything just falls into place. My only comment is multithreading is not for
the faint of heart and should only be interesting to hard-core geeks. However,
knowledge gained in this chapter can always be useful in understanding future concurrency aspects of COM.
As we shall see later, COM+ and Windows 2000 will build on top of these basic
concepts to create a user-friendly approach to COM multithreading.
- Inside COM by Dale Rogerson
- Inside DCOM by the Eddons
- Essential COM by Don Box
- Essence of COM by David Platt
- Understanding COM+ by David Platt