As we’ve seen in other articles in this series, XHTML offers a lot of advantages over HTML. It’s stricter, cleaner, more robust, and extensible. However, some modern browsers such as Internet Explorer don’t understand XHTML; they only understand HTML 4. Fortunately, the two languages are very similar and, by bearing a few guidelines in mind, you can write valid XHTML documents that can be processed correctly by HTML 4 browsers.
A note about media types
When a Web server sends a page to a browser, it also sends along various HTTP headers with it. One of these headers specifies the media type, or “MIME type”, of the page. The usual media type for HTML documents is text/html
, while the media type recommended for XHTML documents is application/xhtml+xml
.
Unfortunately, Internet Explorer (up to and including version 7) doesn’t understand the application/xhtml+xml
media type — indeed, it doesn’t understand XHTML at all — which means your safest bet is to send your XTHML pages as if they were HTML, by using the media type text/html
. If your XHTML pages end with .html
or .htm
filename extensions, then your Web server is probably already doing this. Also, most server-side scripting engines such as PHP and ASP send pages as text/html
by default.
However, because you’re sending your XHTML pages with the HTML media type, all browsers — IE, Firefox, Safari and Opera alike — are forced to interpret your markup as HTML, not XHTML. (In the future, once all browsers understand XHTML, you’ll be able to serve your XHTML documents as “true” XHTML, using the application/xhtml+xml
media type.) This means you need to ensure that the XHTML pages you currently write are backward-compatible with HTML 4 wherever possible. This tutorial shows you how to do this.
Don’t use XML declarations
All XML documents – including XHTML – can have an optional XML declaration at the top of the document that specifies the version of XML used and, optionally, the character encoding of the document. For example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My Web Page</title>
</head>
<body>
</body>
</html>
However, some current browsers mistakenly render the declaration, making it visible to viewers of the page. Even worse, Internet Explorer 6 enters quirks mode when it sees an XML declaration, which can cause all sorts of layout issues if you’re not careful.
So it’s best to avoid using XML declarations at this time. You can instead use a meta
element in the document head to specify the character encoding:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
<title>My Web Page</title>
</head>
<body>
</body>
</html>
Add a space at the end of empty elements
Some older browsers get confused by empty elements such as <br/>
and <img src="mypic.jpg" alt="Pic"/>
. To increase the chances of these browsers understanding empty elements, put a space before the trailing slash:
<br />
<img src="mypic.jpg" alt="Pic" />
Always write empty elements in minimised form
Some old browsers also have issues with empty elements written in long-hand. For example, the following valid single XHTML element may well be rendered as two line breaks on some browsers:
<br></br>
To avoid this issue, make sure you write all empty elements, such as br
and hr
, in minimised form:
<br />
<hr />
Never write non-empty elements in minimised form
In XHTML, you can write non-empty elements in minimised form if they don’t happen to contain any content:
<title />
<p />
<span />
Many current browsers can’t cope with this, however. To be on the safe side, always write such elements with both a start and an end tag, even if they don’t contain any content:
<title></title>
<p></p>
<span></span>
Avoid extraneous white space in attribute values
XHTML allows you to place extra white space characters within attribute values. (Leading and trailing white space characters are removed during processing.)
<img src="rifle.jpg" alt="
This is my rifle.
There are many like it, but this one is mine." />
However, this can have unexpected results with current HTML browsers. To be on the safe side, don’t use extra white space:
<img src="rifle.jpg" alt="This is my rifle. There are many like it, but this one is mine." />
If using the xml:lang
attribute, use lang
as well
When using xml:lang
to specify the language of an element, add a lang
element as well for the benefit of HTML browsers. (XHTML browsers will read the xml:lang
attribute in preference.) For example:
<p xml:lang="fr" lang="fr">Je pense, donc je suis</p>
Specify fragments with name
as well as id
In XHTML, fragment identifiers (for example, <a href="#top">...</a>
) reference fragments specified using id
s (for example, <a id="top"></a>
). (For more on this see our Introducing XHTML article.) In HTML 4, however, the name
attribute is used to specify fragments. So to ensure compatibility with HTML 4 browsers, use name
as well as id
:
<a id="top" name="top"></a>
Don’t use '
The '
character reference — referring to an apostrophe — was introduced in XHTML 1.0, but it’s not valid HTML 4. Therefore, to make your XHTML documents HTML4-compatible, you should replace '
with '
(the numeric equivalent).
Further reading
The XHTML 1.0 specification includes a list of HTML compatibility guidelines for writing HTML-compatible XHTML. As well as covering the points mentioned in this article, the guidelines also offer advice for writing HTML pages that will work on both HTML browsers and XHTML browsers.
Leave a Reply