Friendship Building & COM
by Binh Ly
It's a proven fact that if many
"smaller-sized" applications work together to solve a problem,
the solution is almost always better than that of a single application alone -
kind of like two minds are better than one.
Applications have evolved to such a point where
no single application can do everything anymore. Imagine developing a contact
manager application to keep track of all our friends. We all know that this application will need to store the contact information somewhere, and that
somewhere is commonly another application - the database server application.
This application may also need automated fax/email capabilities in case we want
to send jokes to our friends. For this, it's common sense that we'd want to find another application that provides fax/email capabilities and simply make friends
with that application. This application may also need mail-merge
capabilities in case we want to print nice party invitations for our friends.
For this, we'd probably want to enlist the help of another application
that does mail-merges in its sleep. This list can go on and on, but the
obvious fact here is that application friendliness is always good and healthy.
Being friendly implies being able to
communicate. Communication requires that friends talk with each other using a
common and standard language. This is important because we want applications
(past, present, and future) to be able to easily join the circle of friends
without any communication hassles. Under Windows (or any
message-based environment for that matter), a trivial communication mechanism
can be accomplished by sending messages between applications. Consider our
contact manager application (C) wanting to send email out to a group of contacts
using a messaging application (M). Using the Win32 messaging mechanism, C can send a
to M passing along whatever information M needs (target email address, email
message, etc.). M then receives the message from C, unpacks the message, and
then sends email out using the information obtained from the message:
Figure: Using a messaging
mechanism between Contact Manager and Messaging applications
Messaging is an excellent way to communicate
and has in fact been used successfully over the years. However, one
characteristic of messaging is that working with raw messages is sometimes terse
and not for the faint of heart. Complex messaging often involves passing
structures of information between applications and data conversions that can
get very tedious and error-prone.
Messaging has gone a long way since it
has been introduced. For instance, wouldn't it be more
natural for application M to expose its emailing facility through an object, Messenger, so that C can work with
Messenger by simply calling a SendMail method of Messenger? (note: I'm starting to talk about objects and OO terminology here so if
you're on unfamiliar ground, I suggest you brush up on OO basics)
Figure: Contact Manager talking to
Working with objects this way has distinct
- You can easily describe and communicate the
entire process to your less-technically minded peers. Imagine describing to
your client how you're going to send a message whose first 100 bytes is the
email address and the next 1024 bytes is the email message, etc (you'd be
lucky if your client is technically inclined).
A more understandable
description would go something like: Application C contacts application M and asks for
Messenger (business) object. Application M hands C a Messenger object which C can then use by calling
- You don't have to deal with the intricacies
of the low-level messaging and plumbing. Let's say you successfully implement a messaging mechanism between 2 applications using Win32 messaging. Circumstances in the future may require that the 2
applications communicate from different machines. You now have a problem
because Win32 messaging only works within a single machine. Because of this,
you have no choice but to develop a new messaging mechanism that works
across the network. But the question is, do you
really want to do that?
- Working with objects is natural and
easy. Development environments that do not support constructs/facilities
for low-level messaging can easily be used to develop applications that can
talk with other applications through objects.
At this point, you're probably wondering
"If I'm working with objects across 2 applications, isn't some sort of
messaging still needed?" Of course you still need some sort of messaging,
but you don't have to do it yourself - just leave it to the experts! The
experts at Microsoft really are experts and they have built all that grungy
low-level messaging functionality into the bowels of COM.
|When I say bowels of COM, I mean that the
implementations are done in DLLs and EXEs referred to as the COM libraries.
Don't be alarmed! You don't need to know which DLLs/EXEs are the COM libraries
because your operating system will take care of running the right COM library
file depending on what aspects of COM you are using.
In this respect, you build
your applications on top of COM and COM is
really just another application that you're trying to be friends with.
The next big question is "Objects are
cool, but how exactly do I work with an object from within another
application?" If you're an OO developer, you're familiar with how you can
create and use objects all from within one application - you simply get a raw
memory pointer to an object and you call methods on it. If you work with an object from another
application, you obviously don't get a raw pointer simply because memory
pointers are generally valid only within a single application. Furthermore, if the other
application is on another machine, obviously there is no such thing as a raw
pointer because pointers are meaningless across machines.
|Note: For purposes of
simplicity, when I talk about 2 applications, I normally mean 2 EXEs. As we
shall see later, COM applications can also be libraries (DLLs) in which case you
really do get raw pointers to objects - but we'll get to this.
If we don't get raw pointers, what exactly do
Well, we still get a pointer but this pointer
is called an interface pointer. An interface pointer is very similar to a raw
object pointer. You use an interface pointer as if it is a raw pointer. The important thing about interface pointers is that an
application works with an interface pointer, and the interface pointer works
with COM to do all the low-level messaging necessary to communicate with the object that
resides in the other application. This is illustrated as follows:
Figure: Interface pointer
In essence, an interface pointer feels
and breathes like a pointer to the real object and the application that uses it couldn't tell
the difference. An obvious advantage to this is that the real object
can exist on another machine and the interface pointer can simply use COM to do
all the low-level network messaging as necessary.
| Note that this "COM
across the network" thing I am talking about is also referred to as
Distributed COM or simply DCOM - a term you can use to impress your friends!
Before I go any further, I'd like to introduce
the concept of "clients" and "servers" in COM. A server is
the application that contains objects for other applications to use - in other
words, it provides services to other applications. A client is the application
that uses an object from a server - in other words, it uses the services
provided by the server. From our simple example above, the contact manager application is the
client and the messaging application is the
server. It's important to
know this concept early on because COM literature uses the terms
"client" and "server" when explaining a lot of things.
To put it all in context: The process of
application to application communication using COM can generally be described as
follows: the client talks to an interface pointer, the interface pointer talks
to COM, and COM talks to the server - a really simple concept that took me a
couple of months to understand.
The next question is "What exactly is an
interface pointer and how do I get a hold of it?" Good question!
Let's look at our Messenger object again. We
introduced a SendMail method into Messenger so that our contact
manager application can use that method to send email. If we later want to add more
functionality into Messenger for clients to use, then we simply add
more methods to Messenger. The set of all methods (or functionality) that Messenger
exposes to its clients is referred to as Messenger's interface. In other words an
interface defines a client's view of an object - the interface defines what
a client can do to a given object.
Different objects provide different
functionality and thus different interfaces. If you work with a lot of objects,
it makes sense to name each object's interface similar to how we
humans identify ourselves using names. In COM, an interface is named 2 ways - an
ugly name and a friendly name. The ugly
name takes the form of a sequence of numbers (usually hex format) and is guaranteed to be statistically unique. What this means is that if
we take all the interfaces in the world together, the probability that 2 interfaces will
have the same ugly name is almost impossible. The uniqueness of ugly names
enables applications to precisely specify which interfaces they want to work
The problem with ugly names is
that they look something like this: 0000031A-0000-0000-C000-000000000046. Obviously, no
sane person would like to work with ugly names because we humans prefer to use easy to
remember names like "Joe", "Jill", etc. This is where the
friendly name comes in. By convention, a friendly name for an interface consists
of the prefix "I" (I for
Interface) prepended to the name of the object or a description of the object's functionality. For our
Messenger object, IMessenger
would be a conventional friendly name.
In short, if our Messenger object
provides 2 methods for sending and receiving email, SendMail and GetMail, then we can define the IMessenger interface
IMessenger = interface
In this context, a pointer to an interface
simply is a pointer to an "object" that exposes
the interface. When I refer to "object" in this context, I don't
the server object itself - remember that we cannot have pointers to objects across
applications. COM performs a bit of black magic to silently create another
object that resides in the client application (which is no different than a raw
pointer to an internal client object) and using a dose of voodoo, COM makes this
object look like it's the real one by enabling this object to expose the same
exact interface that the server object exposes. Whew! Did you get all that? To
Figure: Interface pointer pointing
"shadow" (or proxy) object
To summarize what we've learned so far, COM
enables applications to communicate with each other using the "client works
with an object" concept. A client talks to a server object only through the
object's interface. An interface is nothing but a set of methods that exposes
the object's functionality to its clients.
A subtle advantage of inter-application
communication using COM is that we can get an excellent degree of reusability. For
instance, if we were to create 10 other applications that each required email
facilities, these applications can simply make friends with our existing Messenger
Figure: Clients reusing Messenger
In other words, we can clearly see that we
need not waste time and money reinventing the wheel when somebody has already
done that for us. What's even better is that that somebody might even be way
better than us at doing what they do best. After all, wouldn't we
prefer that the email functionality be written by experts in electronic
Consider a client application (C1) wanting to use the Messenger object from
our messaging server M. C1 contacts M for a Messenger object. M launches,
creates a Messenger object and hands it back to C1, with C1 ultimately getting
back an IMessenger interface pointer. C1 uses IMessenger and then when it's
done, C1 tells M that it doesn't need the Messenger object anymore so M destroys
the object and then terminates.
Consider another scenario where C1 again contacts M for a Messenger object
and gets back IMessenger. But now, while C1 is using IMessenger, a second client
(C2) comes along and also contacts M for the same Messenger object that C1 is currently using.
C2 then gets an IMessenger and while C2 is using it, C1 decides that it's done
using it's IMessenger so C1 tells M to destroy it. M's a smart and moral
application and sees that since another client, C2, is still using the Messenger
object, M doesn't destroy the object nor would it want to terminate. If M did
that, C2 would get very mad and would probably not talk to M anymore. Because of
this, M would keep running until C2 explicitly announces that it doesn't need the Messenger object anymore.
So what's M's secret? How does M know who or how many clients are still using
it so as not to accidentally pull the plug on any of them? The answer is really simple and
it's based on a concept called reference counting. Using reference counting, M
keeps a count of how many clients are currently referencing/using it. Strictly
speaking, references are counted at the object level meaning that it's the
Messenger object that actually keeps track of a count of all its active clients.
The counting mechanism works in a deceptively simple way: whenever a client
wants to use Messenger, it tells Messenger to increment the count, and when the
client is done with Messenger, it tells Messenger to decrement the count. This
way, as long as Messenger's count is greater than 0, Messenger knows that there
are still clients connected to it.
Going back to our case of C1 and C2: when C1 contacts M the first time for
Messenger, C1 gets Messenger and tells Messenger to increment its refcount
(shorthand for reference count) to 1 (refcounts always start at 0). While C1 uses Messenger, C2 contacts
M for Messenger and also tells Messenger to increment its refcount, this time raising it up to 2. While C2 uses Messenger, C1 decides that it's done and tells
Messenger to decrement its refcount down to 1. Once C2 decides that
it's done, it tells Messenger to decrement its refcount down to 0. To
Messenger, a 0 refcount means that nobody uses it anymore so Messenger knows
that it's safe to destroy itself and eventually tell M to terminate. Isn't the principle
of reference counting so simple?
Forget about reference counting and let's talk about interfaces.
As we said, an interface is a collection of methods that a server object exposes
to its clients. Sometimes, we get too carried away and try to stuff as much as
we can into an interface easily resulting in an interface with hundreds of methods.
The human brain has a notorious habit of abhorring complexity, i.e. if there's
too many of something to remember, we try to categorize things into manageable
groups and hierarchies that are easy to remember. This habit can also
effectively be applied to interfaces: if we have an interface that has hundreds
of methods, that is an indication that our interface is trying to do too much.
In this case, we normally look at the methods and then categorize them into
smaller groups of common functionality, each of which can be defined as a
separate interface. We then end up with an object that has multiple interfaces which
simplifies a lot of things for the client.
Consider this contrived example for our Messenger object. Previously, we've
already given our Messenger object the ability to send and receive email using
the SendMail and GetMail methods. Later on, we decide to give Messenger fax
messaging capabilities perhaps by adding 2 new methods: SendFax and ReceiveFax.
Being the good designers that we are, we immediately realize that Messenger now
has 2 independent groups of functionality: one that deals with email messaging
and the other that deals with fax messaging. This now qualifies as a categorization into 2 separate interfaces. We'll create an IEmail interface to
deal with email messaging and an IFax interface to deal with fax messaging. Our
Messenger object now exposes 2 interfaces: the IEmail interface and the IFax
IEmail = interface
IFax = interface
Figure: Messenger exposing 2
interfaces: IEmail and IFax
You might say, "Well why not just combine the email and fax messaging
functionality which consist of only 4 methods into the one IMessenger
interface?" My question to you is: if tomorrow we add 10 more methods to
IMessenger, and the next day we add 20 more methods, and the next week we add 30
more methods, and the next month we add 50 more methods then we eventually end
up with an IMessenger interface with at least 110 methods. I don't know about
you but that sounds too me like we're bastardizing IMessenger each day into the
future (that is, if it has a future). Although this example is rather simple and
contrived, it still clearly indicates a very important concept: when designing
an object's interface, always create simple, compact, and usable interfaces; if
you need to add functionality that you feel does not belong to an existing
interface, then create a new interface to represent that functionality. Heed
this concept well unless you want to create a mess of crap later on - believe
me, I've been there.
Since a single object can expose multiple interfaces,
an object needs to be
able to give the client whichever interface the client wants. In other words, a
client should be able to query a server object for any of its supported interfaces. Using
our example, if a client wanted to use Messenger to send email, the client
would ask Messenger for its IEmail interface. If a client wanted to use
Messenger to send a fax, the client would ask Messenger for its IFax interface.
More importantly, if a client wanted to send both email and fax, the client
should be able to ask for both IEmail and IFax from a single Messenger object.
The ability to reference count and to query interfaces at runtime is a basic
necessity for effective client to server communication. If the server didn't
reference count, it wouldn't know when and when not to safely self-destruct. If
the server didn't allow interface querying, a client wouldn't be able to get to
all the functionality that the server exposes. Since a client talks to the
server only through its interface, it is logical that the interface should provide
the ability to reference count and perform interface querying. In fact, every
interface should provide the client the ability to reference count and perform
To be more concrete, every interface must:
- Provide a method to increment the server's refcount
- Provide a method to decrement the server's refcount
- Provide a method to ask a server for any interface that the server
That would mean that our IEmail and IFax interfaces should at least look like
IEmail = interface
IFax = interface
Since this requirement for 3 methods is common to every single interface,
let's go a step further and group the 3 methods together.
In COM, these 3 methods grouped together is what is termed as the
interface. IUnknown provides the 3 methods:
- QueryInterface - enables a client to query for a server's interface
- AddRef - enables the client to increment the server's refcount
- Release - enables the client to decrement the server's refcount
IUnknown = interface
A very subtle concept here is that a client does not necessarily need to get
to IUnknown to be able to perform refcounting and interface querying. The client
should be able to perform refcounting and interface querying using any
interface of the server. In other words, every interface must contain the 3
IUnknown methods or in geek terms, every interface must inherit from IUnknown.
That includes IEmail, IFax, IWhatever, etc:
IEmail = interface (IUnknown) //
IEmail inherits from IUnknown
IFax = interface (IUnknown) // IFax inherits from IUnknown
Interface inheritance here simply means
that the descending interface will include all of the base interface methods as
its first few methods.
Where Are We?
What I've shown you so far is what should be learned first in COM. It's very
hard to learn other things about COM without understanding everything that I
just said here. Trust me, these concepts might seem easy to understand but I can
tell you one thing - you do not know COM if you do not at least understand the
concepts of interfaces and IUnknown.
- Understanding ActiveX and OLE by David
- Inside COM by Dale Rogerson
- Inside DCOM by the Eddons
- Essential COM by Don Box