Capt. Horatio T.P. Webb
 
ROLLING OUT XML TO THE HTML PAGE
(XML CROSS BROWSER ISSUES)
Parks -- Summer 2014

Version 1 -- 6/29/2014

 

Processing an XML tree for the purpose of displaying its content on a web page can be done several ways -- depending on the browser. This page will cover three alternatives:

  1. the MS IE using: some_object_name = new ActiveXObject("Microsoft.XMLDOM");
  2. the cross browser javascript loop processing using DOMParser
  3. the cross browser javascript recursion using DOMParser

The XML example used on this page is based on the following DTD:

 <?xml version="1.0"?>
 <!DOCTYPE marx_brothers_movies [
 <!ELEMENT marx_brothers_movies (movie+)>
 <!ELEMENT movie (title, year,role+)>
 <!ELEMENT title (#PCDATA)>
 <!ELEMENT year (#PCDATA)>
 <!ELEMENT role (actor, character)>
 <!ELEMENT actor (#PCDATA)>
 <!ELEMENT character (#PCDATA)>
 ]>

The root tag (named marx_brothers_movies) contains multiple movie tags.

Each movie contains the children:

0. title
1. year
and 2 through root.childNodes[i].length role tags
each role tag has two children: actor and character

The purpose of the javascript is to make a table of movies with: movie and year in a row; and a row for each actor and character.

  1. The MS IE Approach

    In 1999 Microsoft added XML support to Internet Explorer Version 4. This addition utilized:

    some object name = new ActiveXObject("Microsoft.XMLDOM");

    If you are using IE you can click the first button labelled "IE XML DOM" to see the sample XML processed into a table. The javascript in the function go1():

    1. retrieves a string containing XML from a textarea named ta
    2. loads the string into the XMLDOM with the object name xmlDoc1: xmlDoc1.loadXML(x);
    3. sets the root node: var root=xmlDoc1.documentElement;
    4. starts the table and writes the title row
    5. creates two loops to process the XML
    6. the javscript code for go1 is:

      function go1() // *** parse with Microsoft .XMLDOM
      {
       
      x=document.f1.ta.value;
      //
      // *** detect MS IE or NOT
      //
          var iev=0;
          var ieold = (/MSIE (\d+\.\d+);/.test(navigator.userAgent));
          var trident = !!navigator.userAgent.match(/Trident\/7.0/);
          var rv=navigator.userAgent.indexOf("rv:11.0");
      
          if (ieold) iev=new Number(RegExp.$1);
          if (navigator.appVersion.indexOf("MSIE 10") != -1) iev=10;
          if (trident&&rv!=-1) iev=11;
      
      if (iev !=0)
        {
         xmlDoc1=new ActiveXObject("Microsoft.XMLDOM");
         xmlDoc1.async=false;
         xmlDoc1.loadXML(x);
         var root=xmlDoc1.documentElement;
         os2="<table border='1'><tr><td colspan='2'><center><b>MARX BROTHERS MOVIES</b></td></tr>";
         for (i=0;i<root.childNodes.length;i++) // *** movies loop
             {
               os2=os2+"<tr><td colspan='2' bgcolor='#cccccc'><b>"+root.childNodes[i].childNodes[0].text;
               os2=os2+" ("+root.childNodes[i].childNodes[1].text+")</td></tr>";
               for (j=2;j<root.childNodes[i].childNodes.length;j++) // *** roles loop
                   {
                     os2=os2+"<tr><td>"+root.childNodes[i].childNodes[j].childNodes[0].text+"</td>";
                     os2=os2+"<td>"+root.childNodes[i].childNodes[j].childNodes[1].text+"</td></tr>";
                   }
             }
         document.getElementById("xout1").innerHTML=os2+"</table>"; 
         }
      else
         alert ("Browser does NOT support Microsoft.XMLDOM");
      

      Note the browser detection code at the beginning of go1:

      //
      // *** detect MS IE or NOT
      //
          var iev=0;
          var ieold = (/MSIE (\d+\.\d+);/.test(navigator.userAgent));
          var trident = !!navigator.userAgent.match(/Trident\/7.0/);
          var rv=navigator.userAgent.indexOf("rv:11.0");
      
          if (ieold) iev=new Number(RegExp.$1);
          if (navigator.appVersion.indexOf("MSIE 10") != -1) iev=10;
          if (trident&&rv!=-1) iev=11;
      
      if (iev !=0) // if "iev" is NOT zero it is a version of MS Internet Explorer
                   // if "iev" is sero it is NOT MS Internet Explorer
      .
      .
      .
      

  2. The DOMParser Approach

    For non-IE browsers, the DOMParser can be used. The javascript function go2:

    1. retrieves a string containing XML from a textarea named ta
    2. removes all extraneous bytes between the tags -- like: carriage returns, line feeds and tabs.
    3. loads the string into the DOMParser with the object name xmlDoc: xmlDoc=parser.parseFromString(x,"text/xml");
    4. sets the root node: var root=xmlDoc1.documentElement;
    5. creates two loops to process the XML
    6. the javscript code for go2 is:

      function go2() //*** uses DOMParser but uses loops instead of a tree parser
      {
         x=document.f1.ta.value;
         //
         //  *** remove carriage returns, line feeds and tabs from the string (ASCII chars 13,10 and 11)
         //
         cr=String.fromCharCode(10);
         tb=String.fromCharCode(11);
         lf=String.fromCharCode(13);
         while (x.indexOf(cr)>-1)
                x=x.replace(cr,"");
         while (x.indexOf(lf)>-1)
                x=x.replace(lf,"");
         while (x.indexOf(tb)>-1)
                x=x.replace(tb,"");
        
         parser=new DOMParser();
         xmlDoc=parser.parseFromString(x,"text/xml");
         root=xmlDoc.documentElement;
         os2="<table border='1'><tr><td colspan='2'><center><b>MARX BROTHERS MOVIES</b></td></tr>";
         for (i=0;i<root.childNodes.length;i++) // *** movies loop
             {
              os2=os2+"<tr><td colspan='2' bgcolor='#cccccc'><b>"+root.childNodes[i].childNodes[0].childNodes[0].nodeValue;
              os2=os2+" ("+root.childNodes[i].childNodes[1].childNodes[0].nodeValue+")</td></tr>";
              for (j=2;j<root.childNodes[i].childNodes.length;j++) // *** roles loop
                  {
                   os2=os2+"<tr>";
                   os2=os2+"<td>"+root.childNodes[i].childNodes[j].childNodes[0].childNodes[0].nodeValue+"</td>";
                   os2=os2+"<td>"+root.childNodes[i].childNodes[j].childNodes[1].childNodes[0].nodeValue+"</td>";
                   os2=os2+"</tr>"
                  }
             }
         document.getElementById("xout2").innerHTML=os2+"</table>"; 
      }
      

    The difference between the XMLDOM and the DOMParsers is twofold:

    1. The MS XMLDOM ignores any extraneous whitespace between the tags, but the DOMParser does not. The DOMParser actually interprets this whitespace as data. So, to remove the whitespace, go2 removes the: carriage returns, line feeds and tabs BEFORE the data is loaded into the DOM.

    2. The DOMParser treats the data (i.e., content) of any tag as a childNode. The MS XMLDOM has a property called ".text", the DOMParser property for the content is called ".nodeValue". But. the MS XMLDOM's .text property is a property of the node, whereas the DOMParser's .nodeValue is a property of the first (and only) child of a node. Thus to get the data for the ith nodes' jth child, you use:

      For the MS XMLDOM node's content is:

           root.childNodes[i].childNodes[j].text

      which is the same as the DOMParser's node content like this:

           root.childNodes[i].childNodes[j].childNodes[0].nodeValue

      This issue originates with the XML node property called ".nodeType". There are two primary types of nodes in XML:

      1. nodeType 1 for nodes that have children nodes OR data nodes
      2. nodeType 3 for the content of a node that has data (i.e., .nodeValue)
      3. So, the DOMParser stores the content of a node as a node (the first and only child of a node that has data).

      Otherwise the algorithms 1 and 2 are the same.

  3. Recursive parsing using the DOMParser

    Rather than using javascript loops to control the production of the HTML table production, one can utilize a recursive javascript algorithm to parse the entire XML tree and output the appropriate HTML code when specific tags are encountered.

    The data for the recursive version starts with the javascript function go3:

    function go3() // *** tree recursion version using DOMParser (uses treeWalk above)
    {
       x=document.f1.ta.value; // get the XML text string
       //
       //  *** remove carriage returns, line feeds and tabs from the string (ASCII chars 13,10 and 11)
       //
       cr=String.fromCharCode(10);
       tb=String.fromCharCode(11);
       lf=String.fromCharCode(13);
       while (x.indexOf(cr)>-1)
              x=x.replace(cr,"");
       while (x.indexOf(lf)>-1)
              x=x.replace(lf,"");
       while (x.indexOf(tb)>-1)
              x=x.replace(tb,"");
       os="";
       parser=new DOMParser();                         // *** create a new instance of the DOMParser
       xmlDoc=parser.parseFromString(x,"text/xml");    // *** load the XML string
       root=xmlDoc.documentElement;                    // *** set the root
       // *** create the table and the table header
       os="<table border='1'><tr><td colspan='2'><center><b>MARX BROTHERS MOVIES</b></center></td></tr>";
       // *** feed the root string to treeWalk
       treeWalk(root,true);
       // *** wrap up the table and display the output.
       document.getElementById("xout3").innerHTML=os+"</tr></table>"; 
    }
    

    Once the string data has been loaded into the DOM, the javascript code employs three functions to process and output the XML tree to HTML.

    1. the entry point function treeWalk

      To start the process of any branch, treeWalk is passed parent of the branches a an object
      (the first time treeWalk is passed the root tag (i.e., documentElement). If the node has children, an array of child nodes id created and passed to the child node process named: loopChildren. Further, a Boolean is provided to control whether or not the output is to be produced.

      function treeWalk(this_node,output)
      {
          var nodes;
          if(this_node.childNodes) // *** this node has children
            {
              nodes = this_node.childNodes;
              loopChildren(nodes,output);
            }
      }
      

    2. the child processing function loopChildren

      The childNodes array is received as input by loopChildren. The loop processes each child by:

      1. handing the node to the processNode for output 2. if the child in the loop has children, the children are handed to treeWalk for processing before the next sibling is processed. (this is the recursive part of the code. treeWalk is called inside treeWalk until all the children are processed.

      function loopChildren(nodes,output)
      {
          var node;
          for (var i=0;i<nodes.length;i++)
              {
                node = nodes[i];
                if(output)
                    processNode(node);
                if(node.childNodes)    
                   treeWalk(node,output); //   this is the recursion
                                          //   if a child has children we must process
                                          //   the children (all of them) before we
                                          //   proceed to the next 
              }
      }
      

    3. the output process for each node processNode

      This function performs the output task for each node. It first determines whether this is a node (nodeType is 1) or a data (nodeType is 3). If a node, it checks the .tagName and performs any necessary output tasks before the tag's content id output. If the node is data (i.e., has a nodeValue), the content is output, then any necessary output which must follow the data is produced.

      function processNode(node)
      {
       
          if(node.nodeType === 1) // *** this node is a tag NOT contents of a tag
          {
            // *** code for processing various tags (what to do when the tag is encountered)
            //     these tags typically have children
            //     code performs actions to be done BEFORE the content is presented
            if (node.tagName=="movie")
               {
                 os=os+"<tr bgcolor='#cccccc'><td colspan='2'>r;<b>";
               }
            if (node.tagName=="role")
               {
                os=os+"<tr>";
               } 
          }
          else
          {
            // *** code for displaying the content of tag
            //     also handles the display of what follows the content'
            //
            if(node.nodeType === 3) // *** this node is the content (i.e., text) NOT a tag
              {
                if(node.nodeValue)
                 {
                   // ** various code depending on the tagName of the content
                   if (node.parentNode.tagName=="title") os=os+node.nodeValue;
                   if (node.parentNode.tagName=="year") os=os+" ("+node.nodeValue+")</td></tr>"
                   if (node.parentNode.tagName=="actor")os=os+"<td>"+node.nodeValue+"</td>";
                   if (node.parentNode.tagName=="character")os=os+"<td>"+node.nodeValue+"</td></tr>";
                 } 
               }        
           }   
      }
      

Here is the example output:

  1. column 1 shows the XML as a string in a textarea
  2. column 2 has a button that produces the output using the IE XMLDOM (if applicable -- i.e., if your browser is IE)
  3. column 3 has a button that produces the output using the DOMParser (all browsers)
  4. column 4 has a button that produces the output using the DOMParser recursion (all browsers)

1. XML string store in a textarea named ta