Important: |
---|
This is retired content. This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. |
When a text file is opened with the xmlDoc.loador xmlDoc.loadXMLmethods (where xmlDocis an XML DOM document), the parser strips most white space from the file unless specifically directed otherwise; the parser notes within each node whether one or more spaces, tabs, newlines, or carriage returns follow the node in the text by setting a flag. This method is efficient, reducing both the size of each XML file and the number of calculations required to redisplay the XML in a browser. However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. Tabs, in particular, can be lost because they are not formally recognized in the default mode as anything but white space.
XSLT uses the XML DOM, not the source document, to guide the transformation. Because the white space has already been stripped to process the XML into the DOM, white space characters are lost even before the transformation takes place. Most of the XSLT-related methods for specifying white space in the source data document or style sheets are applied too late to make a difference in formatting.
The preserveWhiteSpaceproperty tells the XML parser whether or not to convert white space from the initial source file that acts against the XML DOM. If explicitly set (the default is FALSE), it must always be set prior to loading a file; otherwise, the default is to strip white space characters and reduce the file to the smallest possible stream. When preserveWhiteSpaceis set to TRUE, the XML document retains all of the characters within the file when converted into a DOM; when set to FALSE, the white space characters are stripped from the file.
If you set the preserveWhiteSpaceproperty from TRUE to FALSE then back to TRUE for a given DOM document, the spaces will not reappear — setting the property to FALSE actually removes the space from the DOM, which cannot reconstruct it.
If you are working with XML as a data format streamed to some other process, disable preserveWhiteSpaceby setting it, or allowing it to default, to FALSE. If retaining positional information is important, for example, in conversions to non-XML formats like tab-separated data, set preserveWhiteSpaceto TRUE. Be aware this option increases the number of characters and places more demands on the browser.
To demonstrate how white space can be programmatically controlled using the DOM, see the following three documents:
-
striporpres.xml
The XML source document, which includes elements that contains different kinds of white space, including tags, spaces, and newlines. -
striporpres.xsl
An XSLT style sheet that makes invisible white space visible for demonstration purposes. -
striporpres.htm
An HTML file that contains Microsoft JScript® that loads the striporpres.xml and striporpres.xsl documents into the separate DOM document objects, and then alternately sets the preserveWhitespaceproperty of the DOMDocumentobject to TRUE or FALSE. The JScript also applies the striporprres.xls style sheet to the striporpres.xml document using the transformNodemethod and assigns the resulting string to the innerHTMLproperty, which is used to display the results in Internet Explorer.
striporpres.xml
The following shows the code for the striporpres.xml document. In this document, each of the <whitespace> elements includes different kinds of white space, including tabs, spaces, and newlines.
Copy Code | |
---|---|
<whitespaceTest> <whitespace>Tabs[]</whitespace> <whitespace>Spaces[ ]</whitespace> <whitespace>Newlines[ ]</whitespace> </whitespaceTest> |
In the striporpres.xml style sheet, there is no <?xml-stylesheet?> processing instruction. Instead, a style sheet is applied programmatically to the document using the transformNodemethod. Unlike a "pure XSLT" solution, this technique allows you to set and reset the preserveWhiteSpaceproperty.
striporpres.xsl style sheet
The striporpres.xsl style sheet consists of two template rules. The first rule applies to the source document root node, instantiating an HTML <pre> element node in the result tree. The transformation of the <whitespace> elements are placed in this node, which is handled by the second template rule. Like the <xsl:preserve-space> and <xsl:strip-space> Example, the XML Path Language (XPath) translate()function is used to make the invisible white space visible.
Copy Code | |
---|---|
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <pre><xsl:apply-templates /></pre> </xsl:template> <xsl:template match="whitespace"> <!-- Use translate() XPath function to convert character X to character Y. --> <xsl:value-of select="translate(.,' 	','-NRT')"/> </xsl:template> </xsl:stylesheet> |
striporpres.htm
The striporpres.htm file contains a JScript function, preserveStripPreserve, that is called when the file is loaded. The preserveStripPreservefunction first sets the preserveWhiteSpaceproperty, and then loads striporpres.xml into the objSrcTree DOMDocumentobject.
Copy Code | |
---|---|
<html> <head> <title>Demo: Controlling White Space Output via the DOM</title> <script language='JScript'> <!-- function preserveStripPreserve() { // Associate the result tree object with any element(s) whose // id attribute is "testResults." var objResTree = document.getElementsByTagName("*")['testResults']; // Declare two new Msxml DOMDocument objects. var objSrcTree = new ActiveXObject('Msxml2.DOMDocument'); var objXSLT = new ActiveXObject('Msxml2.DOMDocument'); // Load the two DOMDocuments with the XML document and the // XSLT style sheet. objSrcTree.preserveWhiteSpace = true; objSrcTree.load('stripOrPres.xml'); objXSLT.load('stripOrPres.xsl'); // Use the transformNode method to apply the XSLT to the XML. strResult = objSrcTree.transformNode(objXSLT); // Now reset preserveWhiteSpace to false, and re-load the source... objSrcTree.preserveWhiteSpace = false; objSrcTree.load('stripOrPres.xml'); // ...and rerun the transform. Note the concatenation of the // this transformation's result tree to the one created when // preserveWhiteSpace was true. strResult = strResult + objSrcTree.transformNode(objXSLT); // Assign the resulting string to the result tree object's // innerHTML property. objResTree.innerHTML = strResult; return true; } --> </script> <body onload='preserveStripPreserve()'> <div id='testResults'></div> </body> </html> |
Results
In Internet Explorer, the striporpres.htm file appears as follows.
This block shows the results from setting the preserveWhiteSpaceproperty to TRUE.
Tabs[TT]
Spaces[--]
Newlines[NNT]
This block shows the results from setting the preserveWhiteSpaceproperty to FALSE.
Tabs[TT]Spaces[--]Newlines[NNT]
In the first block, the contents of each of the original <whitespace> elements appear on a separate line. In the second block, where the preserveWhiteSpaceproperty is set to FALSE, all contents of these elements appear on a single line. In addition, the first block is indented, while the second block is not. The indents and line breaks in the first block are a result of newline and tab characters in the source document between the boundaries of the <whitespace> elements. This white space is affected by the preserveWhiteSpaceproperty.
The white space within the <whitespace> elements is not affected by the value of the preserveWhiteSpaceproperty. To remove white-space-only text children of elements, use the XML Path Language (XPath) normalize-space() Function.
Note: |
---|
Both of the "Newlines" blocks in the result tree contain a newline-newline-tab sequence, NNT, even though the corresponding <whitespace> element in the XML source document appeared to include only a pair of newlines. The extra tab is in the source document between the second newline and the ] character. |