Chapter 1. User Guide

Table of Contents

Tutorial: Getting Started with XMLProbe

Tutorial: Getting Started with XMLProbe

What is XMLProbe?

XMLProbe is an XML content quality assurance tool. It processes XML documents, applying human-authored rules and emitting a report according to those rules. Rules are expressed using XPath 1.0 and XML within an open framework language called SILCN (pronounced as 'silken'), which is a flexible, lightweight framework for selecting, identifying and locating sets of common nodes in XML documents.

The learning curve for XMLProbe will therefore be gentle if you already have some knowledge of XML and XPath. However, even if you have no previous experience of using XPath expressions, the tutorial will show you how XMLProbe may be customized for your own quality assurance requirements with a minimum of delay.

The Validation Process

In simplest terms, XMLProbe's task is to:

  1. use XPath expressions given in a SILCN rule to locate particular 'bad' nodes in your document

  2. report its findings in the form of messages specified in your rules.

The report itself is an XML document, (let's say it is 'report.xml' for our purposes), and you can then make full use of the flexibility of XML itself to do such things as:

  • produce expressive, easy-to-read QA reports as web pages

  • trigger other automated processes (e.g. email alerting)

  • feed into data-cleansing software.

In this document, we assume the process is produce an HTML quality report on a document (say, 'to-test.xml'). In this case the order of processing is:

Installing and running XMLProbe

Place the executable JAR provided (xmlprobe.jar) in an appropriate location on your computer. You will also need to have the Sun Java JDK 1.4 or later installed. For Java downloads, please see http://java.sun.com/.

Pass the sys id of the rules file and the XML file to be processed to XMLProbe as follows:

java -jar xmlprobe.jar {rules doc} {doc to test} [>] [report file]

Anatomy of an XMLProbe configuration file

XMLProbe QA rules may be prepared with any XML editor (e.g. a plain text editor) and stored as a native XML file for use by XMLProbe.

An XMLProbe configuration file can be thought of as falling into two main parts:

  1. XML that configures the tool behaviour

  2. The QA rules.

Typically the configuration options will not be changed very much, once set up for a particular workflow. The quickest way to see what sort of options they cover is to refer to the comments in the supplied skeleton.xml rules file which is included in the distribution in the extras/rules folder. More detailed technical documentation can be found in the later sections of this document.

The remainder of the configuration instance consists of QA rules, and it is the writing of these that this tutorial will cover.

To get you started as quickly as possible and to give you some idea of the capabilities of XMLProbe, this tutorial includes an example that demonstrates the beginning stages of rules development.

Writing XMLProbe rules

The example file wai-qa.xml contains a set of rules to check any given XHTML document against the Web Accessibility Initiative (WAI) Guidelines (http://www.w3.org/TR/WCAG10/). Once you have reviewed the example, you can explore XMLProbe further by writing some new QA rules, or by modifying some of the rules provided. You may also test the flexibility of the resulting report by using and adapting the XSLT scripts.

An Example: Generating a WAI conformance report

Each XMLProbe QA rules is encapsulated by a silcn:set-criterion element within the supplied wai-qa.xml instance.

The basic structure of an XMLProbe rule is as follows:

<silcn:set-criterion>
<silcn:id> MessageIDnumber </silcn:id>
<silcn:expression>
XPath expression and QA test
</silcn:expression>
<probe:message>
Plain language error message to be reported if the XPath
expressions are true for each relevant node found by XMLProbe.
</probe:message>
</silcn:set-criterion>

where the silcn Namespace prefix is bound to the Namespace name http://silcn.org/200309, and the probe Namespace prefix is bound to the Namespace name http://xmlprobe.com/200312.

If you now look at the wai-qa.xml instance, you will see that constructing the rules is quite simple.

Take the very first Priority 1 WAI guideline, Number 1.1:

http://www.w3.org/TR/WCAG10/ Provide a text equivalent for every non-text element (e.g., via 'alt', 'longdesc', or in element content). This includes: images, graphical representations of text (including symbols), image map regions, animations (e.g., animated GIFs), applets and programmatic objects, ascii art, frames, scripts, images used as list bullets, spacers, graphical buttons, sounds (played with or without user interaction), standalone audio files, audio tracks of video, and video. [Priority 1]

Quality assurance test for this particular guideline would require XMLProbe to locate tags such as img, input, applet or object within the html document, NOT containing an alt or a longdesc attribute value indicating the presence of a text equivalent.

A simple XPath test in the case of the img tag, for instance, would take the form: //img [not (@alt)and not (@longdesc)], where the location path indicates that the search should proceed down the document, looking for and matching any img elements which do not contain either an alt or a longdesc attribute. A similar test for applet elements, which cannot contain a longdesc attribute, would be: //applet [not (@alt)].

XPath allows the user to combine these tests into a single, powerful search, using the union operator ('|'):

<silcn:set-criterion> 1
 <silcn:id>1001</silcn:id> 2
 <silcn:expression>//img[not(@alt)and not(@longdesc)]
|//input[not(@alt)]
|//applet[not(@alt)]
|//object[not(@alt)or(@longdesc)]</silcn:expression> 3
 <probe:message>WAI Guideline 1.1: <probe:eval>name()</probe:eval>
should have a text-equivalent in the form of an alt
or longdesc attribute.</probe:message> 4
</silcn:set-criterion>
1

The silcn:set-criterion element is the container for the rule

2

The unique identifier for the rule

3

The XPath expression for the rule

4

What to report when the rule is triggered.

You can fully customize the error-message you wish XMLProbe to generate: in this case we chose to include the particular guideline with which the document has failed to comply. Note here the use of the probe:eval element, which allows dynamic text to be incorporated into the error message. This too is an XPath expression; its context is that of the result of the XPath expression evaluated for the rule itself.

Element content not in the XMLProbe or SILCN namespaces will be passed through verbatim, allowing literal elements to be included in the <probe:message>.

The silcn:id element content can be used to differentiate various types or severity of errors. In this case Priority 1, 2, and 3 error-messages have corresponding silcn:id element content, starting with the digits 1, 2, and 3 respectively - this can be used to affect how errors are presented later in the process.

This rule displays the three essential characteristics of an XMLProbe rule. These are:

  1. Validation is concerned with reporting on bad or dubious data, so the formulation of rules is rooted in a clear expression of what is 'wrong' or 'dubious'

  2. XPath expressions must always result in nodes which expose such bad or dubious data

  3. Every rule must have a unique identifier.

XPath expressions may be authored for a wide-range of other QA checks, including:

  • Ensuring element contains data and is not empty

  • Ensuring data conforms to required format

  • Ensuring element has no more than a certain number of a particular child element

  • Ensuring element has specific parent or children

  • Ensuring specific elements precede or succeed particular elements

Take another Priority 1 WAI guideline, Number 6.2:

http://www.w3.org/TR/WCAG10/ Ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported. If this is not possible, provide equivalent information on an alternative accessible page. [Priority 1]...If it is not possible to make the page usable without scripts, provide a text equivalent with the NOSCRIPT element...

The XPath in this case requires a positional test. In order to ensure that the given document fulfils this requirement, every script element in the document (which may contain multiple script elements), has to be located and tested to see whether it is immediately followed by a noscript alternative. XMLProbe allows for the formulation of such detailed tests, which in wai-qa.xml is expressed thus:

<silcn:set-criterion>
<silcn:id>1002</silcn:id>
<silcn:expression>
//script[ not( following-sibling::*[1][self::noscript])]
</silcn:expression>
<probe:message>
WAI Guideline 6.3: In order to ensure that pages are usable
when scripts are turned off or not supported, provide a text
equivalent with the 'noscript' element.
</probe:message>
</silcn:set-criterion>

Let's now take another Priority 1 WAI guideline, Number 6.1:

http://www.w3.org/TR/WCAG10/ Organize documents so they may be read without style sheets. For example, when an HTML document is rendered without associated style sheets, it must still be possible to read the document. [Priority 1]

In this case XMLProbe needs to look for either style tags, or link tags within the head of the html document. If the latter is found, it needs to check whether the element contains an href attribute ending with a .css extension, indicating reference to an external stylesheet. The XPath test for the first would be: //style, matching any style elements within the document, while the content of the link tags may be tested with the following XPath expression: /html/head/link/@href[endswith( ., ".css" )]. Combining the two, we get:

<silcn:set-criterion>
<silcn:id>1003</silcn:id>
<silcn:expression>
//style
|/html/head/link/@href[ends-with(., ".css")]
</silcn:expression>
<probe:message>
WAI Guideline 6.1: Ensure that pages are still readable
even if the provided internal or external stylesheets
are not usable.
</probe:message>
</silcn:set-criterion>

Note that although this rule gives a potentially useful indication to the user of whether or not a web page uses a style sheet, whether or not the pages violates a WAI guidelines is ultimately a matter of judgement for a human user!

Getting results

Once the full set of required rules has been written, try running XMLProbe, either on the sample file (sample.html) or any other XHTML file you would like to check.

XMLProbe will produce a full report (a sample wai-report.xml is included in the distribution). Here is one of the rules triggered by the sample, as it is reported in the XML of the report:

<silcn:matched-set> 1
 <silcn:id>1002</silcn:id> 2
 <silcn:node> 3
  <silcn:expression>/html[1]/body[1]/div[9]/script[2]</silcn:expression> 4
  <probe:systemId>sample.html</probe:systemId> 5
  <probe:line>180</probe:line> 6
  <probe:column>36</probe:column>
  <probe:text>WAI Guideline 6.3: In order to ensure that pages are
usable when scripts are turned off or not supported, provide a text
equivalent with the 'noscript' element.</probe:text> 7
 </silcn:node>
</silcn:matched-set>
1

The silcn:matched-set element is the container for the each set of matched items in the report

2

The silcn:id element content identifies which rule is being reported on

3

Each matched node is described in the content of a silcn:node element

4

The XPath expression gives a location for the offending node

5

This is the SYSTEM ID of the file which has been tested

6

XMLProbe also emits a physical location for the node, for quick reference using a text editor or other non XML-savvy tools

7

This is the message from the rule instance, with any dynamic text resolved.

Generating a web report

Now that we have got our error messages, the next stage is to generate a more readable web report. This can be done with a simple XSLT script. The distribution includes probe-report.xsl, which provides the basic routines needed to produce an html report from wai-report.xml. Of course, the flexibility inherent in having the initial report in XML format means that it allows you to adapt the look and feel of the report in any way you want. You just need to write new XSLT templates to override the ones given in probe-report.xsl to sort, rearrange, or ignore altogether messages generated by XMLProbe, depending on their silcn:id or content.

Additional resources

The following websites offer useful information and introductions to XPath, XSLT, and SILCN:

http://www.w3.org/TR/xpath
http://www.w3.org/TR/xsl/
http://silcn.org/
http://www.nwalsh.com/docs/tutorials/xsl/xsl/frames.html