Writing HTML-compatible XHTML

XHTML may be the Web markup language of the future, but many browsers these days can only understand HTML. What can you do to ensure that your XHTML documents work correctly on current browsers?

As we've seen in other articles in this series, XHTML offers a lot of advantages over HTML. It's stricter, cleaner, more robust, and extensible. However, some modern browsers such as Internet Explorer don't understand XHTML; they only understand HTML 4. Fortunately, the two languages are very similar and, by bearing a few guidelines in mind, you can write valid XHTML documents that can be processed correctly by HTML 4 browsers.

A note about media types

When a Web server sends a page to a browser, it also sends along various HTTP headers with it. One of these headers specifies the media type, or "MIME type", of the page. The usual media type for HTML documents is text/html, while the media type recommended for XHTML documents is application/xhtml+xml.

Unfortunately, Internet Explorer (up to and including version 7) doesn't understand the application/xhtml+xml media type — indeed, it doesn't understand XHTML at all — which means your safest bet is to send your XTHML pages as if they were HTML, by using the media type text/html. If your XHTML pages end with .html or .htm filename extensions, then your Web server is probably already doing this. Also, most server-side scripting engines such as PHP and ASP send pages as text/html by default.

However, because you're sending your XHTML pages with the HTML media type, all browsers — IE, Firefox, Safari and Opera alike — are forced to interpret your markup as HTML, not XHTML. (In the future, once all browsers understand XHTML, you'll be able to serve your XHTML documents as "true" XHTML, using the application/xhtml+xml media type.) This means you need to ensure that the XHTML pages you currently write are backward-compatible with HTML 4 wherever possible. This tutorial shows you how to do this.

Don't use XML declarations

All XML documents - including XHTML - can have an optional XML declaration at the top of the document that specifies the version of XML used and, optionally, the character encoding of the document. For example:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>My Web Page</title>
  </head>
  <body>
  </body>
</html>

However, some current browsers mistakenly render the declaration, making it visible to viewers of the page. Even worse, Internet Explorer 6 enters quirks mode when it sees an XML declaration, which can cause all sorts of layout issues if you're not careful.

So it's best to avoid using XML declarations at this time. You can instead use a meta element in the document head to specify the character encoding:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
    <title>My Web Page</title>
  </head>
  <body>
  </body>
</html>

Add a space at the end of empty elements

Some older browsers get confused by empty elements such as <br/> and <img src="mypic.jpg" alt="Pic"/>. To increase the chances of these browsers understanding empty elements, put a space before the trailing slash:


<br />
<img src="mypic.jpg" alt="Pic" />

Always write empty elements in minimised form

Some old browsers also have issues with empty elements written in long-hand. For example, the following valid single XHTML element may well be rendered as two line breaks on some browsers:


<br></br>

To avoid this issue, make sure you write all empty elements, such as br and hr, in minimised form:


<br />
<hr />

Never write non-empty elements in minimised form

In XHTML, you can write non-empty elements in minimised form if they don't happen to contain any content:


<title />
<p />
<span />

Many current browsers can't cope with this, however. To be on the safe side, always write such elements with both a start and an end tag, even if they don't contain any content:


<title></title>
<p></p>
<span></span>

Avoid extraneous white space in attribute values

XHTML allows you to place extra white space characters within attribute values. (Leading and trailing white space characters are removed during processing.)


<img src="rifle.jpg" alt="
  This is my rifle.
  There are many like it, but this one is mine." />

However, this can have unexpected results with current HTML browsers. To be on the safe side, don't use extra white space:


<img src="rifle.jpg" alt="This is my rifle. There are many like it, but this one is mine." />

If using the xml:lang attribute, use lang as well

When using xml:lang to specify the language of an element, add a lang element as well for the benefit of HTML browsers. (XHTML browsers will read the xml:lang attribute in preference.) For example:


<p xml:lang="fr" lang="fr">Je pense, donc je suis</p>

Specify fragments with name as well as id

In XHTML, fragment identifiers (for example, <a href="#top">...</a>) reference fragments specified using ids (for example, <a id="top"></a>). (For more on this see our Introducing XHTML article.) In HTML 4, however, the name attribute is used to specify fragments. So to ensure compatibility with HTML 4 browsers, use name as well as id:


<a id="top" name="top"></a>

While using the name element in this way is valid in XHTML 1.0, it is deprecated, and will be removed from future versions of XHTML.

Don't use &apos;

The &apos; character reference — referring to an apostrophe — was introduced in XHTML 1.0, but it's not valid HTML 4. Therefore, to make your XHTML documents HTML4-compatible, you should replace &apos; with &#39; (the numeric equivalent).

Further reading

The XHTML 1.0 specification includes a list of HTML compatibility guidelines for writing HTML-compatible XHTML. As well as covering the points mentioned in this article, the guidelines also offer advice for writing HTML pages that will work on both HTML browsers and XHTML browsers.

Follow Elated

Related articles

Responses to this article

There are no responses yet.

Post a response

Want to add a comment, or ask a question about this article? Post a response.

To post responses you need to be a member. Not a member yet? Signing up is free, easy and only takes a minute. Sign up now.

Top of Page