XMLProbe contains a number of in-built extensions to XPath 1.0. These are documented in this section.
Returns the node nd passed to the function if it is an element whose type has been declared in a governing DTD's content model to allow #PCDATA, otherwise an empty node-set.
Returns path as an absolute URI, using by default the URI of the ruleset as a base for evaluation. If a second parameter is specified, the base URI of the document containing the first node in this set is used as a base for evaluating the first argument as a relative path.
Returns the Unicode code point value for the single-character string s as a decimal number, e.g. for 'a' a value of 97 is returned. Note that surrogate pairs are not supported.
Checks cell spanning in CALS and OASIS Exchange tables. The node passed in must be an element of type tgroup. Always returns an empty node-set. If the table contains cell spanning errors, error messages are emitted to this effect. This function will prompt error messages to be emitted under the following conditions:
If the tgroup has an attribute cols and its value cannot be coerced to an integer, the message "Value of cols attribute is not an integer" occurs.
If a row element contains more entry children than its specification allows, the message "table row has too many cells!" occurs.
If the specification of two (or more) cells means that they would occupy the same area of table grid, i.e. because of a collision of vertically or horizontally spanned cells, the message "table cell may encroach into area reserved by spanning operation" occurs. In this case, the location of the latter cell specified which would cause such a collision is reported.
Note that the attributes colname, namest and nameend, if present, will have the prefix 'col' removed before being processed.
Returns the node-set passed to the function if the string value of that node-set contains a character whose Unicode code point falls within the CombiningChar subset, as defined in Appendix B of the XML Recommendation 1.0. Otherwise the function returns an empty node-set.
Note that in XMLProbe the document function takes an optional second argument. If specified, the URL base of the document containing this node is used as a base for evaluating the first argument as a relative path.
Returns true if s1 ends with with s2.
The paramater ns passed to this function must be a single node, the function returns this node set if the file denoted by the string value of that node exists; if it does not exist an empty node set is returned.
If a second parameter is specified, the URL base of the document containing the first node in this set is used as a base for evaluating the first argument as a relative path.
As per file-exists() above, except that the case of the filename is taken into account when deciding whether a file exists. This function is useful in operating environments which do not observe distinctions in filename case strictly (e.g. MS Windows).
Returns, as a node-set, an XML fragment representing an area of the file system.
The fragment is built by performing a recursive descent of the file system from the location indicated by the passed parameters. The second argument establishes part of the base URL for this descent, being the URL of the directory for the document which contains this node. The first argument then provides a relative modification to this base.
So, for example the command
file-system-as-xml('.',.)performs following steps:
The second argument is examined to establish an absolute URL, in this case the node specified is . (dot), the current node of the document-being-processed, so an absolute URL is taken to be the containing directory of that document.
The first parameter is examined to build a path relative to that URL. In this case the directory '.' (the current directory) is specified. This is then resolved relative to the URL determined above.
The recursive descent is performed, and the result returned as an XML fragment.
The structure of the XML fragment is that of item elements. These are always empty elements for file items, and for directory items may in themselves contain further item elements, and so on. These elements always have the following attributes:
| absolutePath | The absolute path of the file system item, in system-specific syntax. On UNIX systems, a relative pathname is made absolute by resolving it against the current user directory. On Microsoft Windows systems, a relative pathname is made absolute by resolving it against the current directory of the drive named by the pathname, if any; if not, it is resolved against the current user directory. |
| absoluteURL | Contains a file: URL that represents this abstract pathname. |
| isDirectory | has value 'true' of 'false' depending on whether the item is a directory. |
| isFile | has value 'true' of 'false' depending on whether the item is a file. |
| isHidden | has value 'true' of 'false' depending on whether the item is hidden. |
| lastModified | A numeric value representing the time the file was last modified, measured in milliseconds since the epoch (00:00:00 GMT, January 1, 1970). |
| length | The length of the file in bytes. For directories this is always 0. |
| name | Returns the name of the file or directory denoted by this abstract pathname. This is just the last name in the pathname's name sequence. |
| parent | Returns the pathname string of this abstract pathname's parent, or the empty string if this pathname does not name a parent directory. |
| path | Converts this abstract pathname into a pathname string. The resulting string uses the default name-separator character to separate the names in the name sequence. |
| relativeURL | Contains a relative URL that represents this abstract pathname. The base for relativization is the root item of the file system XML fragment. Note this means that the relativeURL of the root item is always "." (dot). |
An example of XML created by this function follows:
<item absolutePath="C:\docs\Body" isDirectory="true" isFile="false" isHidden="false" lastModified="1125494948906" length="0" name="Body" path="C:\docs\Body" parent="C:\docs" absoluteURL="file:/C:/docs/Body/" relativeURL="."> <item absolutePath="C:\docs\Body\one.xml" isDirectory="false" isFile="true" isHidden="false" lastModified="1056612510000" length="2276" name="one.xml" path="C:\docs\Body\one.xml" parent="C:\docs\Body" absoluteURL="file:/C:/docs/Body/one.xml" relativeURL="one.xml"/> <item absolutePath="C:\docs\Body\two.xml" isDirectory="false" isFile="true" isHidden="false" lastModified="1130234981828" length="198400" name="two.xml" path="C:\docs\Body\two.xml" parent="C:\docs\Body" absoluteURL="file:/C:/docs/Body/two.xml" relativeURL="two.xml"/> </item>
Returns a string representing a hash value generated from any DTD parsed while validating the instance being processed. If no DTD was processed, returns the string "0".
Returns a string that is the PUBLIC identifier for the DTD that governs the passed node, or an empty string if it is not so governed or this value is not present.
Returns a string that is the SYSTEM identifier for the DTD that governs the passed node, or an empty string if it is not so governed or this value is not present.
Returns a non-empty node-set if any node in the node-set ns passed is enclosed in a CDATA section, otherwise an empty node-set.
Returns a non-empty node-set if the string value of, or the string value of a node in the node-set arg1 is present in arg2, otherwise an empty node-set.
Each node in arg1 and arg2 is evaluated using string(). If a string value in arg1 is present in the list string values of arg2, a non-empty node-set is returned, otherwise an empty node-set.
Returns a non-empty node-set if the string value of, or the string value of a node in the node-set arg1 is present in the node-set returned by evaluating arg2 as an XPath expression, otherwise an empty node-set.
arg2 is evaluated using string() and this string is evaluated as an XPath expression. Each node in arg1 is evaluated using string(). If a string value in arg1 is present in the of list string values returned by evaluating arg2 as an XPath expression, a non-empty node-set is returned, otherwise an empty node-set.
Returns a non-empty node-set if the content model for node n is declared as EMPTY in the governing DTD for the instance, otherwise an empty node-set.
Returns a non-empty node-set if the passed string s is a valid 10-digit ISBN (i.e. conforms to the 10-digit ISBN lexical form, and has a correct checksum).
Returns a non-empty node-set if the passed string s is a valid 13-digit ISBN (i.e. conforms to the 13-digit ISBN lexical form, and has a correct checksum).
Returns a string of the node passed in formatted to ISO8601 if the string value of the node passed in conforms to ISO8601 format, otherwise an empty node-set.
Returns a non-empty node-set if the string value of s is a valid ISSN (i.e. conforms to the 8-digit ISSN lexical form, and has a correct checksum).
Returns a non-empty node-set if the string value of node-to-check (as evaluated by string()) is present in controlled-vocab as refined by node-key and vocab-lookup.
node-to-check - this should be the node whose value is to be searched for in the controlled vocabulary, e.g. a role attribute
node-key - this should be the value to match vocab-lookup against in the controlled vocabulary
controlled-vocab - this should be the node-set where controlled values are found, e.g. an external XML document
vocab-lookup - this XPath expression will be applied to controlled-vocab to restrict which nodes are considered part of the controlled vocabulary.
The function executes as follows:
an XPath expression is constructed thus:
//*[ vocab-lookup = 'node-key' ]
(e.g. //*[ ../@forElementType = 'piece' ])
the resulting XPath expression is evaluated against controlled-vocab, producing a node-set of values applicable to this context (e.g. node-set from document('vocab.xml')//*[ ../@forElementType = 'sect' ] returns all nodes whose parent has a forElementType attribute value of 'sect'.)
to test whether the value of node-to-check is present in the refined controlled vocabulary node-set, the extension function in-nodeset is used.
Returns the node-set passed in if regexp matches anywhere in string-to-search (i.e. the whole string does not have to match), otherwise an empty node-set.
If group-index is specified, the indexed group exists in regexp, and a match is made, the function returns the indexed group only. If no such group exists in regexp, an IndexOutOfBoundsException is reported and an empty node-set returned.
The string regexp is compiled to a java.util.regex.Pattern. An error is reported if the regular expression syntax is incorrect. If the expression compiles successfully, a match is attempted using java.util.regex.Matcher#find(). If a match is made, the node-set passed in is returned, otherwise an empty node-set.
For more information about the regular expression syntax to be used, please see http://java.sun.com/j2se/1.4.1/docs/api/java/util/regex/Pattern.html.
Returns the substring of system-id as evaluated by string() after the last occurrence of the system path separator (typically '/' or '\'). If no path separator is present, the value is returned unchanged. Note that this also means for an empty node-set or the empty string (""), the empty string is returned.
Returns a new node-set containing the result of evaluating the XPath expression xpath against each node in nodes.
Returns the string of the system identifier of the instance as passed verbatim to XMLProbe for processing.
It is sensible always to work with an absolute version of the URL returned by this function, by wrapping it in as-absolute-uri(), e.g.
as-absolute-uri( system-id() )
It is best also to specify the resolution base when passing such values to document(), where this is appropriate, e.g.:
document( as-absolute-uri( system-id(), . ) )/foo/bar
Returns the MIME type of the entity located at url, or the empty string if this cannot be determined (e.g. if the content type is not supported or the URL cannot be accessed), using by default the URI of the ruleset as a base for evaluation. If a second parameter is specified, the base URI of the document containing the first node in this set is used as a base for evaluating the first argument as a relative path. Currently supported formats and the relevant MIME type returned are shown below:
| Format | MIME type |
|---|---|
| GIF | image/gif |
| TIFF | image/tiff |
| XML | application/xml* |
| application/pdf | |
| Postscript | application/postscript |
| HTML | text/html** |
| PNG | image/png |
| ZIP | application/zip |
*Only instances which include an XML declaration are supported. Supported encodings are: US-ASCII, UTF-8, UTF-16 (with or without Byte Order Mark).
**Only instances which omit a DOCTYPE declaration are supported.
Evaluates the node passed in according to the string() function. Returns false if the syntax of the URI is incorrect, otherwise true.