Last updated on 1/25/2016

Friendship Building & COM
by Binh Ly

It's a proven fact that if many "smaller-sized" applications work together to solve a problem, the solution is almost always better than that of a single application alone - kind of like two minds are better than one.

A Contact Manager Application

Applications have evolved to such a point where no single application can do everything anymore. Imagine developing a contact manager application to keep track of all our friends. We all know that this application will need to store the contact information somewhere, and that somewhere is commonly another application - the database server application. This application may also need automated fax/email capabilities in case we want to send jokes to our friends. For this, it's common sense that we'd want to find another application that provides fax/email capabilities and simply make friends with that application. This application may also need mail-merge capabilities in case we want to print nice party invitations for our friends. For this, we'd probably want to enlist the help of another application that does mail-merges in its sleep. This list can go on and on, but the obvious fact here is that application friendliness is always good and healthy.

Being friendly implies being able to communicate. Communication requires that friends talk with each other using a common and standard language. This is important because we want applications (past, present, and future) to be able to easily join the circle of friends without any communication hassles. Under Windows (or any message-based environment for that matter), a trivial communication mechanism can be accomplished by sending messages between applications. Consider our contact manager application (C) wanting to send email out to a group of contacts using a messaging application (M). Using the Win32 messaging mechanism, C can send a message to M passing along whatever information M needs (target email address, email message, etc.). M then receives the message from C, unpacks the message, and then sends email out using the information obtained from the message:

Figure: Using a messaging mechanism between Contact Manager and Messaging applications

Messaging is an excellent way to communicate and has in fact been used successfully over the years. However, one characteristic of messaging is that working with raw messages is sometimes terse and not for the faint of heart. Complex messaging often involves passing structures of information between applications and data conversions that can get very tedious and error-prone.

Messaging has gone a long way since it has been introduced. For instance, wouldn't it be more natural for application M to expose its emailing facility through an object, Messenger, so that C can work with Messenger by simply calling a SendMail method of Messenger? (note: I'm starting to talk about objects and OO terminology here so if you're on unfamiliar ground, I suggest you brush up on OO basics)

Figure: Contact Manager talking to Messenger object

Working with objects this way has distinct advantages:

  1. You can easily describe and communicate the entire process to your less-technically minded peers. Imagine describing to your client how you're going to send a message whose first 100 bytes is the email address and the next 1024 bytes is the email message, etc (you'd be lucky if your client is technically inclined).

    A more understandable description would go something like: Application C contacts application M and asks for a Messenger (business) object. Application M hands C a Messenger object which C can then use by calling Messenger.SendMail.
  2. You don't have to deal with the intricacies of the low-level messaging and plumbing. Let's say you successfully implement a messaging mechanism between 2 applications using Win32 messaging. Circumstances in the future may require that the 2 applications communicate from different machines. You now have a problem because Win32 messaging only works within a single machine. Because of this, you have no choice but to develop a new messaging mechanism that works across the network. But the question is, do you really want to do that?
  3. Working with objects is natural and easy. Development environments that do not support constructs/facilities for low-level messaging can easily be used to develop applications that can talk with other applications through objects.

At this point, you're probably wondering "If I'm working with objects across 2 applications, isn't some sort of messaging still needed?" Of course you still need some sort of messaging, but you don't have to do it yourself - just leave it to the experts! The experts at Microsoft really are experts and they have built all that grungy low-level messaging functionality into the bowels of COM.

When I say bowels of COM, I mean that the implementations are done in DLLs and EXEs referred to as the COM libraries. Don't be alarmed! You don't need to know which DLLs/EXEs are the COM libraries because your operating system will take care of running the right COM library file depending on what aspects of COM you are using.

In this respect, you build your applications on top of COM and COM is really just another application that you're trying to be friends with.

The next big question is "Objects are cool, but how exactly do I work with an object from within another application?" If you're an OO developer, you're familiar with how you can create and use objects all from within one application - you simply get a raw memory pointer to an object and you call methods on it. If you work with an object from another application, you obviously don't get a raw pointer simply because memory pointers are generally valid only within a single application. Furthermore, if the other application is on another machine, obviously there is no such thing as a raw pointer because pointers are meaningless across machines. 

Note: For purposes of simplicity, when I talk about 2 applications, I normally mean 2 EXEs. As we shall see later, COM applications can also be libraries (DLLs) in which case you really do get raw pointers to objects - but we'll get to this.

If we don't get raw pointers, what exactly do we get?

Well, we still get a pointer but this pointer is called an interface pointer. An interface pointer is very similar to a raw object pointer. You use an interface pointer as if it is a raw pointer. The important thing about interface pointers is that an application works with an interface pointer, and the interface pointer works with COM to do all the low-level messaging necessary to communicate with the object that resides in the other application. This is illustrated as follows:

Figure: Interface pointer abstraction

In essence, an interface pointer feels and breathes like a pointer to the real object and the application that uses it couldn't tell the difference. An obvious advantage to this is that the real object can exist on another machine and the interface pointer can simply use COM to do all the low-level network messaging as necessary. 

Note that this "COM across the network" thing I am talking about is also referred to as Distributed COM or simply DCOM - a term you can use to impress your friends!

Before I go any further, I'd like to introduce the concept of "clients" and "servers" in COM. A server is the application that contains objects for other applications to use - in other words, it provides services to other applications. A client is the application that uses an object from a server - in other words, it uses the services provided by the server. From our simple example above, the contact manager application is the client and the messaging application is the server. It's important to know this concept early on because COM literature uses the terms "client" and "server" when explaining a lot of things.

To put it all in context: The process of application to application communication using COM can generally be described as follows: the client talks to an interface pointer, the interface pointer talks to COM, and COM talks to the server - a really simple concept that took me a couple of months to understand.

The next question is "What exactly is an interface pointer and how do I get a hold of it?" Good question! 


Let's look at our Messenger object again. We introduced a SendMail method into Messenger so that our contact manager application can use that method to send email. If we later want to add more functionality into Messenger for clients to use, then we simply add more methods to Messenger. The set of all methods (or functionality) that Messenger exposes to its clients is referred to as Messenger's interface. In other words an interface defines a client's view of an object - the interface defines what a client can do to a given object.

Different objects provide different functionality and thus different interfaces. If you work with a lot of objects, it makes sense to name each object's interface similar to how we humans identify ourselves using names. In COM, an interface is named 2 ways - an ugly name and a friendly name. The ugly name takes the form of a sequence of numbers (usually hex format) and is guaranteed to be statistically unique. What this means is that if we take all the interfaces in the world together, the probability that 2 interfaces will have the same ugly name is almost impossible. The uniqueness of ugly names enables applications to precisely specify which interfaces they want to work with. 

The problem with ugly names is that they look something like this: 0000031A-0000-0000-C000-000000000046. Obviously, no sane person would like to work with ugly names because we humans prefer to use easy to remember names like "Joe", "Jill", etc. This is where the friendly name comes in. By convention, a friendly name for an interface consists of the prefix "I" (I for Interface) prepended to the name of the object or a description of the object's functionality. For our Messenger object, IMessenger would be a conventional friendly name. 

In short, if our Messenger object provides 2 methods for sending and receiving email, SendMail and GetMail, then we can define the IMessenger interface as follows:

IMessenger = interface
  procedure SendMail;
  procedure GetMail;

In this context, a pointer to an interface simply is a pointer to an "object" that exposes the interface. When I refer to "object" in this context, I don't necessarily mean the server object itself - remember that we cannot have pointers to objects across applications. COM performs a bit of black magic to silently create another object that resides in the client application (which is no different than a raw pointer to an internal client object) and using a dose of voodoo, COM makes this object look like it's the real one by enabling this object to expose the same exact interface that the server object exposes. Whew! Did you get all that? To further illustrate:

Figure: Interface pointer pointing to a "shadow" (or proxy) object

To summarize what we've learned so far, COM enables applications to communicate with each other using the "client works with an object" concept. A client talks to a server object only through the object's interface. An interface is nothing but a set of methods that exposes the object's functionality to its clients.

A subtle advantage of inter-application communication using COM is that we can get an excellent degree of reusability. For instance, if we were to create 10 other applications that each required email facilities, these applications can simply make friends with our existing Messenger object.

Figure: Clients reusing Messenger

In other words, we can clearly see that we need not waste time and money reinventing the wheel when somebody has already done that for us. What's even better is that that somebody might even be way better than us at doing what they do best. After all, wouldn't we prefer that the email functionality be written by experts in electronic messaging applications?

Reference Counting

Consider a client application (C1) wanting to use the Messenger object from our messaging server M. C1 contacts M for a Messenger object. M launches, creates a Messenger object and hands it back to C1, with C1 ultimately getting back an IMessenger interface pointer. C1 uses IMessenger and then when it's done, C1 tells M that it doesn't need the Messenger object anymore so M destroys the object and then terminates.

Consider another scenario where C1 again contacts M for a Messenger object and gets back IMessenger. But now, while C1 is using IMessenger, a second client (C2) comes along and also contacts M for the same Messenger object that C1 is currently using. C2 then gets an IMessenger and while C2 is using it, C1 decides that it's done using it's IMessenger so C1 tells M to destroy it. M's a smart and moral application and sees that since another client, C2, is still using the Messenger object, M doesn't destroy the object nor would it want to terminate. If M did that, C2 would get very mad and would probably not talk to M anymore. Because of this, M would keep running until C2 explicitly announces that it doesn't need the Messenger object anymore.

So what's M's secret? How does M know who or how many clients are still using it so as not to accidentally pull the plug on any of them? The answer is really simple and it's based on a concept called reference counting. Using reference counting, M keeps a count of how many clients are currently referencing/using it. Strictly speaking, references are counted at the object level meaning that it's the Messenger object that actually keeps track of a count of all its active clients. The counting mechanism works in a deceptively simple way: whenever a client wants to use Messenger, it tells Messenger to increment the count, and when the client is done with Messenger, it tells Messenger to decrement the count. This way, as long as Messenger's count is greater than 0, Messenger knows that there are still clients connected to it.

Going back to our case of C1 and C2: when C1 contacts M the first time for Messenger, C1 gets Messenger and tells Messenger to increment its refcount (shorthand  for reference count) to 1 (refcounts always start at 0). While C1 uses Messenger, C2 contacts M for Messenger and also tells Messenger to increment its refcount, this time raising it up to 2. While C2 uses Messenger, C1 decides that it's done and tells Messenger to decrement its refcount down to 1. Once C2 decides that it's done, it tells Messenger to decrement its refcount down to 0. To Messenger, a 0 refcount means that nobody uses it anymore so Messenger knows that it's safe to destroy itself and eventually tell M to terminate. Isn't the principle of reference counting so simple?

Interface Querying

Forget about reference counting and let's talk about interfaces. As we said, an interface is a collection of methods that a server object exposes to its clients. Sometimes, we get too carried away and try to stuff as much as we can into an interface easily resulting in an interface with hundreds of methods. The human brain has a notorious habit of abhorring complexity, i.e. if there's too many of something to remember, we try to categorize things into manageable groups and hierarchies that are easy to remember. This habit can also effectively be applied to interfaces: if we have an interface that has hundreds of methods, that is an indication that our interface is trying to do too much. In this case, we normally look at the methods and then categorize them into smaller groups of common functionality, each of which can be defined as a separate interface. We then end up with an object that has multiple interfaces which simplifies a lot of things for the client.

Consider this contrived example for our Messenger object. Previously, we've already given our Messenger object the ability to send and receive email using the SendMail and GetMail methods. Later on, we decide to give Messenger fax messaging capabilities perhaps by adding 2 new methods: SendFax and ReceiveFax. Being the good designers that we are, we immediately realize that Messenger now has 2 independent groups of functionality: one that deals with email messaging and the other that deals with fax messaging. This now qualifies as a categorization into 2 separate interfaces. We'll create an IEmail interface to deal with email messaging and an IFax interface to deal with fax messaging. Our Messenger object now exposes 2 interfaces: the IEmail interface and the IFax interface:

IEmail = interface
  procedure SendMail;
  procedure GetMail;

IFax = interface
  procedure SendFax;
  procedure ReceiveFax;

Figure: Messenger exposing 2 interfaces: IEmail and IFax

You might say, "Well why not just combine the email and fax messaging functionality which consist of only 4 methods into the one IMessenger interface?" My question to you is: if tomorrow we add 10 more methods to IMessenger, and the next day we add 20 more methods, and the next week we add 30 more methods, and the next month we add 50 more methods then we eventually end up with an IMessenger interface with at least 110 methods. I don't know about you but that sounds too me like we're bastardizing IMessenger each day into the future (that is, if it has a future). Although this example is rather simple and contrived, it still clearly indicates a very important concept: when designing an object's interface, always create simple, compact, and usable interfaces; if you need to add functionality that you feel does not belong to an existing interface, then create a new interface to represent that functionality. Heed this concept well unless you want to create a mess of crap later on - believe me, I've been there.

Since a single object can expose multiple interfaces, an object needs to be able to give the client whichever interface the client wants. In other words, a client should be able to query a server object for any of its supported interfaces. Using our example, if a client wanted to use Messenger to send email, the client would ask Messenger for its IEmail interface. If a client wanted to use Messenger to send a fax, the client would ask Messenger for its IFax interface. More importantly, if a client wanted to send both email and fax, the client should be able to ask for both IEmail and IFax from a single Messenger object.

The ability to reference count and to query interfaces at runtime is a basic necessity for effective client to server communication. If the server didn't reference count, it wouldn't know when and when not to safely self-destruct. If the server didn't allow interface querying, a client wouldn't be able to get to all the functionality that the server exposes. Since a client talks to the server only through its interface, it is logical that the interface should provide the ability to reference count and perform interface querying. In fact, every interface should provide the client the ability to reference count and perform interface querying.

To be more concrete, every interface must:

  1. Provide a method to increment the server's refcount
  2. Provide a method to decrement the server's refcount
  3. Provide a method to ask a server for any interface that the server exposes

That would mean that our IEmail and IFax interfaces should at least look like this:

IEmail = interface
  procedure IncrementRefCount;
  procedure DecrementRefCount;
  procedure QueryInterface;
  procedure SendMail;
  procedure GetMail;

IFax = interface
  procedure IncrementRefCount;
  procedure DecrementRefCount;
  procedure QueryInterface;
  procedure SendFax;
  procedure ReceiveFax;

Since this requirement for 3 methods is common to every single interface, let's go a step further and group the 3 methods together.

IUnknown, the Mother of All COM Interfaces

In COM, these 3 methods grouped together is what is termed as the IUnknown interface. IUnknown provides the 3 methods:

  1. QueryInterface - enables a client to query for a server's interface
  2. AddRef - enables the client to increment the server's refcount
  3. Release - enables the client to decrement the server's refcount

IUnknown = interface
  procedure QueryInterface;
  procedure AddRef;
  procedure Release;

A very subtle concept here is that a client does not necessarily need to get to IUnknown to be able to perform refcounting and interface querying. The client should be able to perform refcounting and interface querying using any interface of the server. In other words, every interface must contain the 3 IUnknown methods or in geek terms, every interface must inherit from IUnknown. That includes IEmail, IFax, IWhatever, etc:

IEmail = interface (IUnknown)  // IEmail inherits from IUnknown
  procedure SendMail;
  procedure GetMail;

IFax = interface  (IUnknown)  // IFax inherits from IUnknown
  procedure SendFax;
  procedure ReceiveFax;

Interface inheritance here simply means that the descending interface will include all of the base interface methods as its first few methods.

Where Are We?

What I've shown you so far is what should be learned first in COM. It's very hard to learn other things about COM without understanding everything that I just said here. Trust me, these concepts might seem easy to understand but I can tell you one thing - you do not know COM if you do not at least understand the concepts of interfaces and IUnknown.

Further Reading

  • Understanding ActiveX and OLE by David Chappell
  • Inside COM by Dale Rogerson
  • Inside DCOM by the Eddons
  • Essential COM by Don Box
Copyright (c) 1999-2011 Binh Ly. All Rights Reserved.