Introducing XHTML

What is XHTML, and why is it good? In this article you explore the concepts behind XHTML, and learn how it differs from regular HTML.

XHTML, first introduced in 2000, is billed as the successor to HTML. It's short for Extensible Hypertext Markup Language. XHTML 1.0 is essentially a reworking of HTML 4 in XML - Extensible Markup Language. As such, HTML 4 and XHTML 1.0 are very similar.

XHTML is stricter than regular HTML, as you'll see in a moment. While this extra strictness requires a bit more effort when creating XHTML pages, it does mean that those pages are very easy for computers to read. HTML, in contrast, is notoriously difficult for browsers to interpret — which is partly why no two browsers seem to display a Web page in the same way!

Advantages of XHTML

XHTML offers many advantages over HTML. Here are a few important ones:

  • No more badly-written "tag soup" pages. XHTML ensures that your Web pages are well-formed. This means that the markup contains no errors or ambiguities, and is structured correctly.
  • XHTML pages are readable by more devices. Because XHTML pages are well-formed, they can be more easily read by simple browsers, such as those in mobile phones and PDAs, as well as by standard HTML browsers.
  • It's easy to extract semantic information from XHTML pages. As XHTML is XML, it can be easily processed by any XML parser, making it easy to automatically extract useful info from your XHTML pages.
  • It's possible to add other XML content to an XHTML page. By using XML namespaces, you can "mix and match" plain XHTML with other XML markup - for example, MathML - allowing you to produce rich, semantic Web pages.

Key differences between XHTML and HTML

The current widely-used version of XHTML is version 1.0. Essentially, it's a stricter version of HTML 4.01.

The main differences between HTML 4.01 and XHTML 1.0 are as follows.

XHTML documents must be well-formed

Every XHTML page you create needs to be well-formed. This means that all elements in an XHTML page must be closed and properly nested. For example, the following markup is invalid XHTML, because the b and i elements aren't properly nested (their start and end tags overlap), and the p elements don't have end tags:


<p>The quick <b>brown <i>fox jumps</b> over</i> the lazy dog.
<p>Every good boy deserves fruit.

Here's a corrected version that validates to XHTML 1.0:


<p>The quick <b>brown <i>fox jumps</i> over</b> the lazy dog.</p>
<p>Every good boy deserves fruit.</p>

All XHTML elements must be closed; even elements that can't be closed in HTML. For example, you can't write just <br> in XHTML; you have to write either <br></br> or, more conveniently, <br />. (The latter format is known as minimized tag syntax, and is the preferred way to write empty elements — that is, elements that don't have an end tag in HTML.)

Overlapping elements such as the b and i example above are also technically invalid HTML, although they are tolerated by most browsers. Non-closed tags, however, are perfectly legal in HTML 4.

All XHTML elements must be written in lower case

Unlike HTML tags, you must always use lower-case letters when writing XHTML tags. This is due to the fact that XML is case-sensitive. The following markup is invalid XHTML:


<P>This is invalid because there is no 'P' element in XHTML;
there is only a 'p' element.</P>

You should also use lower-case element and attribute names in any CSS style sheets attached to your XHTML pages.

All attribute values must be quoted

HTML allows you to specify numeric attributes without quotes. With XHTML, all attribute values must have quotes around them — even numeric values:


Invalid XHTML:
<td colspan=2>

Valid XHTML:
<td colspan="2">

Attribute minimization is not allowed

In HTML, some attributes are usually written without a corresponding value — for example:


<input type="checkbox" checked>
<option value="Fred" selected>

In XHTML, this is forbidden; all attributes must have a corresponding value. Rewrite such attributes in XHTML as follows:


<input type="checkbox" checked="checked" />
<option value="Fred" selected="selected" />

The id fragment identifier should be used instead of name

HTML allows you to define a fragment (a section of markup) within the page and create a link to it as follows:


<a name="top"> </a>

...

<a href="#top">Top of page</a>

In XHTML, you should use the id attribute instead of name. However, to ensure compatibility with current browsers, it's wise to include both id and name attributes, as follows:


<a id="top" name="top"> </a>

...

<a href="#top">Top of page</a>

(name is still allowed in XHTML 1.0, although it's deprecated.)

Ampersands (&) on their own are not allowed

In both XHTML and HTML, the ampersand (&) is used to declare entities. For example, &copy; displays the copyright symbol (©), while &amp; displays the ampersand itself (&).

Many HTML browsers interpret an ampersand on its own (&) as a literal ampersand. In XHTML, this is forbidden; if you want to indicate an ampersand, you must encode it as &amp;. This is true even within URLs — for example:


Invalid XHTML:
<a href="/cgi-bin/script.cgi?name=matt&company=elated">

Valid XHTML:
<a href="/cgi-bin/script.cgi?name=matt&amp;company=elated">

XHTML documents must contain certain items

HTML 4 isn't too fussy about exactly which elements you include in your page. XHTML is somewhat stricter. All XHTML documents must contain, at an absolute minimum:

  • A DOCTYPE declaration. Currently available XHTML 1.0 DOCTYPEs are xhtml1-strict, xhtml1-transitional, and xhtml1-frameset.
  • An html element. This must also be the root (top-level) element.
  • An xmlns XHTML namespace declaration. This must appear within the html element.

For example, here is a minimal XHTML Strict page template:


<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

  <head>
    <title></title>
  </head>

  <body>
  </body>

</html>

Note that the head, title and body elements are also required in this minimal document, because they are required child elements of the html element.

We've covered the main differences between XHTML and HTML here. There are a few more differences — most of them subtle — which we'll explore in future articles.

You've now learned what XHTML is and why it's useful, and you've also taken a look at the key differences between XHTML and HTML. Future articles in this series will show you how to build XHTML pages and how to convert existing HTML pages to XHTML. You'll also take a look at some of the compatibility issues surrounding XHTML, and how to overcome them. Stay tuned! :)

In the meantime, here's great XHTML tutorial that goes into further detail on the subject. For even more detail, check out the XHTML 1.0 specification.

Follow Elated

Related articles

Responses to this article

There are no responses yet.

Post a response

Want to add a comment, or ask a question about this article? Post a response.

To post responses you need to be a member. Not a member yet? Signing up is free, easy and only takes a minute. Sign up now.

Top of Page