Identity transform

From HandWiki
Revision as of 09:28, 31 July 2022 by imported>WikiGary (linkage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Form of data transformation

The identity transform is a data transformation that copies the source data into the destination data without change.

The identity transformation is considered an essential process in creating a reusable transformation library. By creating a library of variations of the base identity transformation, a variety of data transformation filters can be easily maintained. These filters can be chained together in a format similar to UNIX shell pipes.

Examples of recursive transforms

The "copy with recursion" permits, changing little portions of code, produce entire new and different output, filtering or updating the input. Understanding the "identity by recursion" we can understand the filters.

Using XSLT

The most frequently cited example of the identity transform (for XSLT version 1.0) is the "copy.xsl" transform as expressed in XSLT. This transformation uses the xsl:copy command[1] to perform the identity transformation:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

This template works by matching all attributes (@*) and other nodes (node()), copying each node matched, then applying the identity transformation to all attributes and child nodes of the context node. This recursively descends the element tree and outputs all structures in the same structure they were found in the original file, within the limitations of what information is considered significant in the XPath data model. Since node() matches text, processing instructions, root, and comments, as well as elements, all XML nodes are copied.

A more explicit version of the identity transform is:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="@*|*|processing-instruction()|comment()">
    <xsl:copy>
      <xsl:apply-templates select="*|@*|text()|processing-instruction()|comment()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

This version is equivalent to the first, but explicitly enumerates the types of XML nodes that it will copy. Both versions copy data that is unnecessary for most XML usage (e.g., comments).

XSLT 3.0

XSLT 3.0[2] specifies an on-no-match attribute of the xsl:mode instruction that allows the identity transform to be declared rather than implemented as an explicit template rule. Specifically:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:mode on-no-match="shallow-copy" />
</xsl:stylesheet>

is essentially equivalent to the earlier template rules. See the XSLT 3.0 standard's description of shallow-copy[3] for details.

Finally, note that markup details, such as the use of CDATA sections or the order of attributes, are not necessarily preserved in the output, since this information is not part of the XPath data model. To show CDATA markup in the output, the XSLT stylesheet that contains the identity transform template (not the identity transform template itself) should make use of the xsl:output attribute called cdata-section-elements.

cdata-section-elements specifies a list of the names of elements whose text node children should be output using CDATA sections. [1] For example:

<xsl:output method="xml" encoding="utf-8" cdata-section-elements="element-name-1 element-name-2"/>

Using XQuery

XQuery can define recursive functions. The following example XQuery function copies the input directly to the output without modification.


declare function local:copy($element as element()) {
  element {node-name($element)}
    {$element/@*,
     for $child in $element/node()
        return if ($child instance of element())
          then local:copy($child)
          else $child
    }
};


The same function can also be achieved using a typeswitch-style transform.

xquery version "1.0";

(: copy the input to the output without modification :)
declare function local:copy($input as item()*) as item()* {
for $node in $input
   return 
      typeswitch($node)
        case document-node()
           return
              document {
                local:copy($node/node())
              }
        case element()
           return
              element {name($node)} {

                (: output each attribute in this element :)
                for $att in $node/@*
                   return
                      attribute {name($att)} {$att}
                ,
                (: output all the sub-elements of this element recursively :)
                for $child in $node
                   return local:copy($child/node())

              }
        (: otherwise pass it through.  Used for text(), comments, and PIs :)
        default return $node
};

The typeswitch transform is sometime preferable since it can easily be modified by simply adding a case statement for any element that needs special processing.

Non-recursive transforms

Two simple and illustrative "copy all" transforms.

Using XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xsl:copy-of select="."/>
  </xsl:template>
</xsl:stylesheet>

Using XProc

<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/ns/xproc">
  <p:identity/>
</p:pipeline>

Here one important note about the XProc identity, is that it can take either one document like this example or a sequence of document as input.

More complex examples

Generally the identity transform is used as a base on which one can make local modifications.

Remove named element transform

Using XSLT

The identity transformation can be modified to copy everything from an input tree to an output tree except a given node. For example, the following will copy everything from the input to the output except the social security number:

<xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- remove all social security numbers -->
  <xsl:template match="PersonSSNID"/>

Using XQuery

declare function local:copy-filter-elements($element as element(), 
   $element-name as xs:string*) as element() {
   element {node-name($element) }
             { $element/@*,
               for $child in $element/node()[not(name(.)=$element-name)]
                  return if ($child instance of element())
                    then local:copy-filter-elements($child,$element-name)
                    else $child
           }
 };

To call this one would add:

$filtered-output := local:copy-filter-elements($input, 'PersonSSNID')

Using XProc

<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/ns/xproc">
  <p:identity/>
  <p:delete match="PersonSSNID"/>
</p:pipeline>

See also

Further reading

  • XSLT Cookbook, O'Reilly Media, Inc., December 1, 2002, by Sal Mangano, ISBN:0-596-00372-2
  • Priscilla Walmsley, XQuery, O'Reilly Media, Inc., Chapter 8 Functions – Recursive Functions – page 109

References