XHTML, first introduced in 2000, is billed as the successor to HTML. It’s short for Extensible Hypertext Markup Language. XHTML 1.0 is essentially a reworking of HTML 4 in XML – Extensible Markup Language. As such, HTML 4 and XHTML 1.0 are very similar.
XHTML is stricter than regular HTML, as you’ll see in a moment. While this extra strictness requires a bit more effort when creating XHTML pages, it does mean that those pages are very easy for computers to read. HTML, in contrast, is notoriously difficult for browsers to interpret — which is partly why no two browsers seem to display a Web page in the same way!
Advantages of XHTML
XHTML offers many advantages over HTML. Here are a few important ones:
- No more badly-written “tag soup” pages. XHTML ensures that your Web pages are well-formed. This means that the markup contains no errors or ambiguities, and is structured correctly.
- XHTML pages are readable by more devices. Because XHTML pages are well-formed, they can be more easily read by simple browsers, such as those in mobile phones and PDAs, as well as by standard HTML browsers.
- It’s easy to extract semantic information from XHTML pages. As XHTML is XML, it can be easily processed by any XML parser, making it easy to automatically extract useful info from your XHTML pages.
- It’s possible to add other XML content to an XHTML page. By using XML namespaces, you can “mix and match” plain XHTML with other XML markup – for example, MathML – allowing you to produce rich, semantic Web pages.
Key differences between XHTML and HTML
The current widely-used version of XHTML is version 1.0. Essentially, it’s a stricter version of HTML 4.01.
The main differences between HTML 4.01 and XHTML 1.0 are as follows.
XHTML documents must be well-formed
Every XHTML page you create needs to be well-formed. This means that all elements in an XHTML page must be closed and properly nested. For example, the following markup is invalid XHTML, because the
i elements aren’t properly nested (their start and end tags overlap), and the
p elements don’t have end tags:
<p>The quick <b>brown <i>fox jumps</b> over</i> the lazy dog. <p>Every good boy deserves fruit.
Here’s a corrected version that validates to XHTML 1.0:
<p>The quick <b>brown <i>fox jumps</i> over</b> the lazy dog.</p> <p>Every good boy deserves fruit.</p>
All XHTML elements must be closed; even elements that can’t be closed in HTML. For example, you can’t write just
<br> in XHTML; you have to write either
<br></br> or, more conveniently,
<br />. (The latter format is known as minimized tag syntax, and is the preferred way to write empty elements — that is, elements that don’t have an end tag in HTML.)
All XHTML elements must be written in lower case
Unlike HTML tags, you must always use lower-case letters when writing XHTML tags. This is due to the fact that XML is case-sensitive. The following markup is invalid XHTML:
<P>This is invalid because there is no 'P' element in XHTML; there is only a 'p' element.</P>
All attribute values must be quoted
HTML allows you to specify numeric attributes without quotes. With XHTML, all attribute values must have quotes around them — even numeric values:
Invalid XHTML: <td colspan=2> Valid XHTML: <td colspan="2">
Attribute minimization is not allowed
In HTML, some attributes are usually written without a corresponding value — for example:
<input type="checkbox" checked> <option value="Fred" selected>
In XHTML, this is forbidden; all attributes must have a corresponding value. Rewrite such attributes in XHTML as follows:
<input type="checkbox" checked="checked" /> <option value="Fred" selected="selected" />
id fragment identifier should be used instead of
HTML allows you to define a fragment (a section of markup) within the page and create a link to it as follows:
<a name="top"> </a> ... <a href="#top">Top of page</a>
In XHTML, you should use the
id attribute instead of
name. However, to ensure compatibility with current browsers, it’s wise to include both
name attributes, as follows:
<a id="top" name="top"> </a> ... <a href="#top">Top of page</a>
name is still allowed in XHTML 1.0, although it’s deprecated.)
&) on their own are not allowed
In both XHTML and HTML, the ampersand (
&) is used to declare entities. For example,
© displays the copyright symbol (©), while
& displays the ampersand itself (&).
Many HTML browsers interpret an ampersand on its own (
&) as a literal ampersand. In XHTML, this is forbidden; if you want to indicate an ampersand, you must encode it as
&. This is true even within URLs — for example:
Invalid XHTML: <a href="/cgi-bin/script.cgi?name=matt&company=elated"> Valid XHTML: <a href="/cgi-bin/script.cgi?name=matt&company=elated">
XHTML documents must contain certain items
HTML 4 isn’t too fussy about exactly which elements you include in your page. XHTML is somewhat stricter. All XHTML documents must contain, at an absolute minimum:
- A DOCTYPE declaration. Currently available XHTML 1.0 DOCTYPEs are
htmlelement. This must also be the root (top-level) element.
xmlnsXHTML namespace declaration. This must appear within the
For example, here is a minimal XHTML Strict page template:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body> </body> </html>
We’ve covered the main differences between XHTML and HTML here. There are a few more differences — most of them subtle — which we’ll explore in future articles.
You’ve now learned what XHTML is and why it’s useful, and you’ve also taken a look at the key differences between XHTML and HTML. Future articles in this series will show you how to build XHTML pages and how to convert existing HTML pages to XHTML. You’ll also take a look at some of the compatibility issues surrounding XHTML, and how to overcome them. Stay tuned! 🙂