In our Introducing XHTML article, we took a look at how XHTML differs from regular HTML 4. In this article, you’ll learn how to convert an HTML 4 Web page to fully standards-compliant XHTML 1.0 by working through a practical example.
The HTML 4 page
Take a look at the page we’re going to convert. This page validates to HTML 4.01 Transitional. The source markup looks like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<TITLE>My cat called Lucky</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
</HEAD>
<BODY>
<A NAME="top"> </A>
<H1>My cat called Lucky</H1>
I have a cat called Lucky. She is black & white, and nearly
twelve years old.<P>
I found her through a pet rescue service. She didn't like her
old home because it had a big scary dog in it that used to
frighten her. When I first got her she was very scared and
hid under the table for a whole week! Nowadays she is still
a bit jittery but much more relaxed.<P>
Here is a picture of Lucky in the garden.<P>
<IMG SRC="images/lucky-being-stroked.jpg" ALT="Lucky" WIDTH=400
HEIGHT=300 BORDER=0>
<BR><BR>
She is very good at catching mice. She also catches birds,
which can be a problem. Now that she has a collar and bell,
though, she catches fewer birds.<P>
<H2>Email Lucky!</H2>
Use the form below to send Lucky an email. You never know -
she might even reply, if she's not too busy!<P>
<FORM METHOD="post" ACTION="mailform.cgi">
Your email: <INPUT TYPE="text" NAME="email"><P>
Your message: <TEXTAREA NAME="message" COLS=40 ROWS=8>
</TEXTAREA><P>
Do you have a cat?
<INPUT TYPE="radio" NAME="haveCat" VALUE="yes" checked>Yes
<INPUT TYPE="radio" NAME="haveCat" VALUE="no">No<P>
<INPUT TYPE="submit" NAME="Send" VALUE="Send Email">
</FORM>
<P><A HREF="#top">Top of page</A>
</BODY>
</HTML>
As you can see, it’s a Web page about my cat. It’s a simple page, but it contains a lot of markup that needs to be changed if the page is going to be valid XHTML 1.0.
Changing tags to lowercase
Our first task is to change all those uppercase tags to lowercase. XHTML requires that all elements and attributes be written in lowercase. Here’s how our markup looks with lowercase tags:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>My cat called Lucky</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<a name="top"> </a>
<h1>My cat called Lucky</h1>
I have a cat called Lucky. She is black & white, and nearly
twelve years old.<p>
I found her through a pet rescue service. She didn't like her
old home because it had a big scary dog in it that used to
frighten her. When I first got her she was very scared and
hid under the table for a whole week! Nowadays she is still
a bit jittery but much more relaxed.<p>
Here is a picture of Lucky in the garden.<p>
<img src="images/lucky-being-stroked.jpg" alt="Lucky" width=400
height=300 border=0>
<br><br>
She is very good at catching mice. She also catches birds,
which can be a problem. Now that she has a collar and bell,
though, she catches fewer birds.<p>
<h2>Email Lucky!</h2>
Use the form below to send Lucky an email. You never know -
she might even reply, if she's not too busy!<p>
<form method="post" action="mailform.cgi">
Your email: <input type="text" name="email"><p>
Your message: <textarea name="message" cols=40 rows=8>
</textarea><p>
Do you have a cat?
<input type="radio" name="haveCat" value="yes" checked>Yes
<input type="radio" name="haveCat" value="no">No<p>
<input type="submit" name="Send" value="Send Email">
</form>
<p><a href="#top">Top of page</a>
</body>
</html>
Notice that we don’t need to change the values of attributes ("Lucky"
, "haveCat"
and so on) to lowercase. Also notice that we made html
lowercase in the DOCTYPE
declaration at the top of the page (but left the other parts of the declaration untouched).
Quoting attribute values and expanding attributes
All attribute values need to be quoted in XHTML, even if they’re numeric. For example:
Incorrect: <img ... border=0>
Correct: <img ... border="0">
In addition, XHTML doesn’t allow you to use attribute names without their values; such attributes need to be expanded:
Incorrect: <input type="radio" ... checked>
Correct: <input type="radio" ... checked="checked">
After going through our HTML page and correcting these issues, we’re left with the following markup:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>My cat called Lucky</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<a name="top"> </a>
<h1>My cat called Lucky</h1>
I have a cat called Lucky. She is black & white, and nearly
twelve years old.<p>
I found her through a pet rescue service. She didn't like her
old home because it had a big scary dog in it that used to
frighten her. When I first got her she was very scared and
hid under the table for a whole week! Nowadays she is still
a bit jittery but much more relaxed.<p>
Here is a picture of Lucky in the garden.<p>
<img src="images/lucky-being-stroked.jpg" alt="Lucky" width="400"
height="300" border="0">
<br><br>
She is very good at catching mice. She also catches birds,
which can be a problem. Now that she has a collar and bell,
though, she catches fewer birds.<p>
<h2>Email Lucky!</h2>
Use the form below to send Lucky an email. You never know -
she might even reply, if she's not too busy!<p>
<form method="post" action="mailform.cgi">
Your email: <input type="text" name="email"><p>
Your message: <textarea name="message" cols="40" rows="8">
</textarea><p>
Do you have a cat?
<input type="radio" name="haveCat" value="yes" checked="checked">Yes
<input type="radio" name="haveCat" value="no">No<p>
<input type="submit" name="Send" value="Send Email">
</form>
<p><a href="#top">Top of page</a>
</body>
</html>
Making the document well-formed
Our HTML Transitional page isn’t well-formed. XHTML Strict requires all documents to be well-formed, so we’ll need to make a few changes to the markup’s structure.
Closing open elements
In order to be a well-formed XHTML document, all elements in the document must be closed. This means they need a closing tag: </p>
, </b>
and so on. Alternatively, if the element is empty (contains no content) then you can just place a slash (/) before the > at the end of the tag — for example, <br />
.
Nesting inline elements inside block elements
Strict-mode documents — whether HTML or XHTML — require that all inline elements such as a
, img
and input
, as well as bare text, are nested inside block-level elements, such as p
or div
. This means that we need to properly wrap our text, as well as any bare inline elements, in <p></p>
tags.
So let’s go through our HTML document and fix up all those unclosed elements and non-nested inline elements. Here’s the result:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>My cat called Lucky</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<p><a name="top"> </a></p>
<h1>My cat called Lucky</h1>
<p>I have a cat called Lucky. She is black & white, and nearly
twelve years old.</p>
<p>I found her through a pet rescue service. She didn't like her
old home because it had a big scary dog in it that used to
frighten her. When I first got her she was very scared and
hid under the table for a whole week! Nowadays she is still
a bit jittery but much more relaxed.</p>
<p>Here is a picture of Lucky in the garden.</p>
<p><img src="images/lucky-being-stroked.jpg" alt="Lucky" width="400"
height="300" border="0" /></p>
<p>She is very good at catching mice. She also catches birds,
which can be a problem. Now that she has a collar and bell,
though, she catches fewer birds.</p>
<h2>Email Lucky!</h2>
<p>Use the form below to send Lucky an email. You never know -
she might even reply, if she's not too busy!</p>
<form method="post" action="mailform.cgi">
<p>Your email: <input type="text" name="email" /></p>
<p>Your message: <textarea name="message" cols="40" rows="8">
</textarea></p>
<p>Do you have a cat?
<input type="radio" name="haveCat" value="yes" checked="checked" />Yes
<input type="radio" name="haveCat" value="no" />No</p>
<p><input type="submit" name="Send" value="Send Email" /></p>
</form>
<p><a href="#top">Top of page</a></p>
</body>
</html>
That’s better. We’ve closed all our elements, either by placing a closing tag after each opening tag, or by using the slash (/
) shortcut to close empty elements. In addition, all inline elements are properly encased in block-level elements — in this case, p
elements.
Removing presentational markup
Generally speaking, XHTML encourages you to use CSS to describe the look of your pages, rather than embedding presentation within the markup. This means that attributes such as align
, size
and border
should be replaced with CSS equivalents; such attributes are deprecated in XHTML. Let’s change our img
element accordingly, from:
<p><img src="images/lucky-being-stroked.jpg" alt="Lucky" width="400"
height="300" border="0" /></p>
to:
<p><img src="images/lucky-being-stroked.jpg" alt="Lucky" width="400"
height="300" style="border: none;" /></p>
Changing name
to id
and encoding ampersands
Nearly there. We just need to make a couple more minor changes to turn our markup into valid XHTML.
First of all, using the name
attribute to identify fragments (sections of markup to link to within the page) is deprecated in XHTML. The id
attribute should be used instead. This means that we need to rewrite our #top
fragment:
<a name="top"> </a>
as:
<a id="top"> </a>
Secondly, we have a single bare ampersand in our markup. This is not allowed in XHTML; all ampersands must be encoded. So we need to change:
<p>I have a cat called Lucky. She is black & white, and nearly
twelve years old.</p>
to:
<p>I have a cat called Lucky. She is black & white, and nearly
twelve years old.</p>
Changing the document type
Excellent! We’ve changed all our markup so that it validates to XHTML 1.0 Strict. We now need to change the page’s document type from HTML 4.01 Transitional to XHTML 1.0 Strict. The DOCTYPE for XHTML 1.0 Strict is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
In addition, we need to add an xmlns
namespace declaration inside the html
element to make the page a valid XML document:
<html xmlns="http://www.w3.org/1999/xhtml">
So our final XHTML 1.0 Strict markup looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My cat called Lucky</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<p><a id="top"> </a></p>
<h1>My cat called Lucky</h1>
<p>I have a cat called Lucky. She is black & white, and nearly
twelve years old.</p>
<p>I found her through a pet rescue service. She didn't like her
old home because it had a big scary dog in it that used to
frighten her. When I first got her she was very scared and
hid under the table for a whole week! Nowadays she is still
a bit jittery but much more relaxed.</p>
<p>Here is a picture of Lucky in the garden.</p>
<p><img src="images/lucky-being-stroked.jpg" alt="Lucky"
style="width: 400px; height: 300px; border: none;" /></p>
<p>She is very good at catching mice. She also catches birds,
which can be a problem. Now that she has a collar and bell,
though, she catches fewer birds.</p>
<h2>Email Lucky!</h2>
<p>Use the form below to send Lucky an email. You never know -
she might even reply, if she's not too busy!</p>
<form method="post" action="mailform.cgi">
<p>Your email: <input type="text" name="email" /></p>
<p>Your message: <textarea name="message" cols="40" rows="8">
</textarea></p>
<p>Do you have a cat?
<input type="radio" name="haveCat" value="yes" checked="checked" />Yes
<input type="radio" name="haveCat" value="no" />No</p>
<p><input type="submit" name="Send" value="Send Email" /></p>
</form>
<p><a href="#top">Top of page</a></p>
</body>
</html>
View the finished XHTML page in all its glory!
As you can see, converting an HTML 4 page to XHTML can be fairly time-consuming, though the process is straightforward. If you’re converting a lot of pages, you might find tools such as HTML Tidy helpful, as they can convert HTML to XHTML automatically.
girishvgopal says
Hi Matt – This is a great tutorial for beginners – thanks a lot.
The link towards the end of the tutorial – “finished XHTML page” is pointing to the original html file. You may want to correct this
Best Regards,
Girish
matt says
Hi Girish,
So it is! Well spotted. I’ll get that fixed up. 🙂
Thanks!
Matt
elioxar says
“This means that attributes such as width, height, align, size and border should be replaced with CSS equivalents”
You got a little confused here. What you say is true concerning align and border – and that’s exactly why they’re deprecated in xhtml1/html4.
‘size’ on the other is not a valid attribute, you made that up.
Your point is however not valid concerning the attributes ‘width’ and ‘height’: they are not just representational mark-up, they give information about the image in use. This is why these attributes are not depracated. You might want to correc that in your article.
Besides, avoiding representational mark-up was just as much a goal for html4, so this chapter doesn’t really belong in here anyway.
matt says
@elioxar: Thanks for your feedback. Good point about width and height – they’re not deprecated. I’ve updated the article.
“‘size’ on the other is not a valid attribute, you made that up.”
No I didn’t: http://www.w3schools.com/tags/tag_font.asp
elioxar says
After my comment I googled the topic. Apparantly I shouldn’t have said you got confused – as it was the w3c who got confused. html4/xhtml is inconsistent about width and height.
matt says
@elioxar: Yeah I think I’ve seen conflicting advice from the W3C too. I believe your basic point about width and height is valid though, since width and height are (usually) intrinsic to the image, rather than being presentational.