|
Hello MSXML
by Binh Ly
Microsoft's XML parser, MSXML, is a COM-based
library used to build high-performance applications that manipulate XML
documents. MSXML is currently on version
3.0 SP1. There is also a technology preview of MSXML
4 from MSDN, with major improvements in the areas of XML
Schemas (XSD). In this lesson, we'll look into:
- Installing and using MSXML
- Understanding the basics of MSXML
programming
Installing MSXML
MSXML
3 can be obtained from the Microsoft site. Standard windows installations (including the latest versions of Internet Explorer) will typically have an
older version of MSXML (such as 2.5). For development and production purposes, I recommend
downloading and installing MSXML 3 SP1 or higher. When deploying applications in a production
environment where you need to use MSXML from a client/desktop machine, MSXML can
be redistributed as a downloadable
CAB file.
|
The lessons in this series are based on
MSXML 3 SP1 or higher.
|
Programming MSXML
MSXML is heavily COM-based. Since
Delphi (since version 3) has excellent support for building COM client
applications, using MSXML from Delphi is as easy as using any other COM library.
If you are not familiar with building COM applications in Delphi, check out my Delphi
COM lessons.
From a high-level view, MSXML supports
the following basic standards:
- Document Object Model (DOM)
- Simple API for XML (SAX)
- XML Namespaces
- XSLT and the XML Path Language (XPath)
- Document Type Definitions (DTD) and
XML Schemas (XDR in MSXML 3, XSD in MSXML 4)
MSXML is a validating parser. In simple
terms, MSXML can be used to check if XML documents conform to a specific schema
(DTD, XDR, or XSD). This feature tremendously helps to automatically validate
XML documents without the need to manually develop tedious code that performs
validation differently in every application. In addition, MSXML validation is
optional and can be programmatically turned on and off as needed.
As with any other COM library, the
first step to using MSXML is to import its interface type information. This is done using
Project | Import Type Library in the IDE or by running the tlibimp.exe command-line utility.
When importing, select
"Microsoft XML (version 3.0)", turn off "Generate Component
Wrapper" (for D5 and above), and select the "Create Unit" option.
This process will then produce a module named MSXML2_TLB.
|
Note that the import module is
named MSXML2_TLB instead of MSXML3_TLB. This idiosyncrasy is due to
Microsoft's retarded programmatic versioning scheme for MSXML.
It is important to install the
latest SP/UP for your specific version of Delphi before performing this
import process. I've heard that the first version of Delphi 6 has some
bugs in the type library importer. D6 users can stick to the D5 import or
wait for the next SP or use my TypeExport
utility.
Delphi 6 provides native XML
capabilities based on a library of XML classes. We will not be covering
the native Delphi XML classes in these lessons because it requires a
different scope in syntax, semantics, and feature discussions.
|
With the generated module in hand, we
can now use this module to unleash the full power of MSXML. Let's start with a
simple example:
uses
MSXML2_TLB;
procedure TForm1.LoadXMLDocumentClick(Sender: TObject);
var
doc: IXMLDOMDocument;
MessageText: string;
begin
//create DOM document instance
doc := CoDOMDocument.Create;
//prepare to load an XML document in synchronous mode
doc.async := False;
//load an XML document from file
if doc.load ('helloworld.xml') then
begin
//extract text value of "message" element
MessageText := doc.documentElement.childNodes [0].text;
ShowMessage (MessageText);
end
else
//error loading, display error
ShowMessage (Format ('Error loading XML document.'#13 +
'Error number: %d'#13 +
'Reason: %s'#13 +
'Line: %d'#13 +
'Column: %d',
[doc.parseError.errorCode,
doc.parseError.reason,
doc.parseError.line,
doc.parseError.linePos]));
end;
The helloworld.xml file is:
<root>
<message>Hello World</message>
</root>
The above example can be dissected as
follows:
- A reference to the MSXML2_TLB module
is first added to the uses clause.
- An XML DOM document instance is
created into the doc variable.
- A "helloworld.xml" file is
loaded into the doc DOM instance in synchronous mode. MSXML's DOM class loads XML documents in asynchronous mode by
default. Thus it is necessary
to reset the doc.async property to False before loading the document.
- On a successful load, the first
child element (message) of the document element (root) is inspected and its
text value ("Hello World") is extracted. This value is then
displayed.
- On an unsuccessful load, an error
message is displayed based on detailed error information contained in the DOM
document instance's parseError property.
If you've never used MSXML before (or
if you've used another XML parser on a different platform), it is important to
understand the mechanics of the above example line-by-line. Not only is the
example trivial, it also illustrates the essential semantics and nuances in
using MSXML as an XML parser and as a COM library.
The DOM document class can also be used to
load XML documents from various sources using various mechanics. For instance:
uses
MSXML2_TLB;
procedure TForm1.LoadXMLDocumentClick(Sender: TObject);
var
doc: IXMLDOMDocument;
S: string;
begin
//create DOM document instance
doc := CoDOMDocument.Create;
//prepare to load an XML document in synchronous mode
doc.async := False;
//load an XML document from URL
if doc.load ('http://www.nowhere.com/test.xml') then ...
//load an XML document from an XML string
S := '<nothing-good/>';
if doc.loadXML (S) then ...
//load an XML document from ASP request stream/object
//this is basically an IStream load
if doc.load (ASPRequest) then ...
//write XML directly from an ADO recordset into DOM instance
//this is basically an IStream write
ADORecordset.Save (doc, adPersistXML);
end;
In any case, it is important to note
the doc.async usage and the if-then test for doc.load and doc.loadXML to
determine if the load operation is successful or not.
Since there's a Load, there's also a Save.
Save is used to persist the string representation of an XML document. For
instance:
uses
MSXML2_TLB;
procedure TForm1.SaveXMLDocumentClick(Sender: TObject);
var
doc: IXMLDOMDocument;
S: string;
begin
//create DOM document instance
doc := CoDOMDocument.Create;
//prepare to load and XML document in synchronous mode
doc.async := False;
//load an XML document
S := '<nothing-good/>';
if doc.loadXML (S) then
begin
//save to file
doc.save ('nothinggood.xml');
end;
end;
The end-result of the above example is
the "<nothing-good/>" XML string persisted into the
nothinggood.xml file. Saving an XML document contained in a DOM document
instance can also be done in several ways:
uses
MSXML2_TLB;
procedure
TForm1.SaveXMLDocumentClick(Sender: TObject);
var
doc: IXMLDOMDocument;
S: string;
begin
//create DOM document instance
doc := CoDOMDocument.Create;
//prepare to load and XML document in synchronous mode
doc.async := False;
//load an XML document
S := '<nothing-good/>';
if doc.loadXML (S) then
begin
//extract XML string from DOM
instance and save it using other mechanics
S := doc.xml;
SaveString (S);
//write an XML document to an ASP
response stream/object
//this is basically an IStream
extraction
ASPResponse.Write (doc);
end;
end;
The above examples give us a general
overview of the mechanics of programming MSXML. The remainder of the lessons in
this series will drill down into the details of how MSXML implements the
various XML standards and features. For now, lets take a quick look at what's contained in the MSXML module (MSXML2_TLB):
Classes:
| CoDOMDocument |
DOM
document class |
| CoFreeThreadedDOMDocument |
High performance
DOM document class. Used for multiple thread access to a single DOM document
instance. |
| CoXSLTemplate |
High performance
XSL stylesheet cache class. Used for repetitive XSL transformations. |
| CoXMLHTTP |
Client-side HTTP
access component. Used to send and receive documents (primarily XML
documents) across HTTP. |
| CoServerXMLHTTP |
High performance
server-side HTTP access component. Used to send and receive documents
(primary XML documents) across HTTP from within a server application. |
| CoSAXXMLReader |
SAX parser engine |
Interfaces:
| IXMLDOMNode |
DOM
node interface |
| IXMLDOMNodeList |
Collection
of DOM nodes |
| IXMLDOMDocument |
DOM
document interface |
| IXMLDOMElement |
DOM
element interface |
| IXMLDOMAttribute |
DOM
attribute interface |
| IXMLDOMCharacterData |
DOM
character data manipulation interface |
| IXMLDOMText |
DOM
text interface |
| IXMLDOMComment |
DOM
comment interface |
| IXMLDOMCDATASection |
DOM
CDATA interface |
| IXMLDOMProcessingInstruction |
DOM
PI interface |
| IXMLDOMParseError |
DOM
document parse error interface |
| IXSLTemplate |
XSL
template stylesheet cache interface |
| IXSLProcessor |
XSL
processor engine interface |
| IVBSAXXMLReader |
SAX
parser engine interface |
| IVBSAXContentHandler |
SAX
content event handler interface |
| IVBSAXAttributes |
SAX
attributes content event handler interface |
| IVBSAXErrorHandler |
SAX
error handler event interface |
| IXMLHTTPRequest |
Client-side
HTTP access interface |
| IServerXMLHTTPRequest |
Server-side
HTTP access interface |
Detailed documentation/reference on the
above and the entire MSXML package can be obtained by separately downloading the
MSXML
SDK. For MSXML 4.0, the SDK is already included in the parser
package.
An important detail to realize up front
is the mechanics of MSXML's COM implementation. For
instance, a DOM element is a special type of a DOM node. In MSXML terms, an
IXMLDOMElement derives from an IXMLDOMNode:
type
IXMLDOMElement = interface(IXMLDOMNode)
...
end;
Using the above definition, an
IXMLDOMNode pointer to a DOM element is cast to an IXMLDOMElement as follows:
function
NodeAsElement (const Node: IXMLDOMNode): IXMLDOMElement;
begin
//use COM QueryInterface cast
//raises an exception if Node is not an element
Result := Node as IXMLDOMElement;
end;
Likewise, downcasting an IXMLDOMElement
pointer to the more generic IXMLDOMNode is done as follows:
function
ElementAsNode (const Elem: IXMLDOMElement): IXMLDOMNode;
begin
//use COM QueryInterface cast
Result := Elem as IXMLDOMNode;
end;
Conclusion
In this lesson, I've given you a basic
overview of how to programmatically use MSXML and its basic features. In the next few lessons, we'll dig into how MSXML does
the DOM, SAX, XSL, Schemas, etc.
|