The XML SiteXML schema and XSL templates for a web article

Defining the schema and templates for generating web articles

Last time we defined an XML Schema and a set of XSL templates to generate a web site. We implemented generic, minimal web page support. This time we will particularize it for a very common subclass of web pages, articles.

Note: We consider an article to be a web page that deals mostly with text content.

The article model

We will start by identifying the elements that an article has and we want to have in XML: titles and article sections. Article sections can in turn contain titles, content (paragraphs, lists, etc.) and other sections.

The XML Schema

So we'll define a section element which can contain: a title, followed by content blocks and/or other sections. For now, we will implement only one content block: a plain text paragraph.

Section TypeThe section type

A section can have:

  • A title
  • One or more text paragraphs
  • One or more sub-sections

All these are optional and can be omitted. At least a title or a paragraph must be present though.

  <xs:complexType name="sectionType">
    <xs:sequence>
      <xs:element name="title" type="xs:string" minOccurs="0"/>
      <xs:element name="paragraph" type="xs:string" minOccurs="0"
        maxOccurs="unbounded"/>
      <xs:element name="section" type="sectionType" minOccurs="0"
        maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>

The article type

We will define the article type as an extension of the page type. The article itslef is just a sequence of sections, with at least one section per article being required.

Article Type

  <xs:complexType name="articleType">
    <xs:complexContent>
      <xs:extension base="pageType">
        <xs:sequence>
          <xs:element name="section" type="sectionType" maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

Website TypeThe website type

The website, previously defined to accept pages, must be changed now to accept both pages and articles, in any order.

  <xs:complexType name="websiteType">
    <xs:sequence>
      <xs:choice maxOccurs="unbounded">
        <xs:element name="page" type="pageType"/>
        <xs:element name="article" type="articleType"/>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>

The Website XML content

Let's input some test content that respects the defined schema.

  <article id="article">
    <section>
      <title>Section 1</title>
      <paragraph>Paragraph</paragraph>
      <section>
        <title>Sub-Section 1.1</title>
        <section>
          <title>Sub-Section 1.1.1</title>
          <paragraph>Text</paragraph>
        </section>
      </section>
    </section>
    <section>
      <paragraph>Section 2 Paragraph</paragraph>
    </section>
  </article>

The XSLT processing rules

In first stage we will ignore the recursive nature of the embedded sections, obtaining a flat view of titles and paragraphs.

The article template

We will extend the page template to match both pages and articles. The template body will stay pretty much the same, except that we will apply templates inside the HTML body tag.

  <xsl:template match="page | article">
    <xsl:result-document href="{@id}.html">
      <html>
        <head/>
        <body>
          <xsl:apply-templates/>
        </body>
      </html>
    </xsl:result-document>
  </xsl:template>

The paragraph template

This rule will match each paragraph element and it will surround its content in HTML p tags.

  <xsl:template match="paragraph">
    <p>
      <xsl:apply-templates/>
    </p>
  </xsl:template>

The title template

This template will match each title element and it will surround its content in HTML H tags. We just use the H1 tag for now, in the future we will use the section level to decide which H tag to output.

  <xsl:template match="title">
    <h1>
      <xsl:apply-templates/>
    </h1>
  </xsl:template>

The generated HTML

Nothing fancy at this stage, just the titles and text paragraphs.

<!DOCTYPE html
  PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
   </head>
   <body>
      <h1>Section 1</h1>
      <p>Paragraph</p>
      <h1>Sub-Section 1.1</h1>
      <h1>Sub-Section 1.1.1</h1>
      <p>Text</p>
      <p>Untitled Section 2 Paragraph</p>
   </body>
</html>

Download: article files.

Read on: Processing the titles of recursive page sections.

First Posted: December 14th, 2005 - Wednesday.
Last Updated: December 15th, 2005 - Thursday.