HTML 4.01 initially is a predecessor to XHTML 1.0. There are some freedoms on HTML 4.01 that are not allowed in XHTML.
Mostly because XHTML is a XML application these rules have to be satisfied:
- Document must be well formed.
- all elements must have closing tags or be written in a special form
- elements must be nested properly
- Elements and attributes must be in lower case.
- Non-empty elements must have closing tags.
- Attribute values must be quoted.
- Attributes have to be written in full; no minimization allowed.
- Empty elements must be closed properly.
- White space must be stripped in attributes.
- strip leading and trailing white space
- sequences of one or more white space characters (including line breaks) must be stripped.
- Script and style elements are declared as having PCDATA content. "<" and "&" are treated as a start of markup and entities "<" and "&" are recognized as "<" and "&". Therefore wrapping the content of the script or style element with a CDATA section will avoid expansion of these entities (it will not "escape" the content or it will treat it as text). Alternative is to use an external script or style document.
- SGML element exclusions from nesting (elements prohibition). Some elements should not be nested under self or other certain elements. Prohibited elements:
- <a> must not contain other <a> elements
- <pre> must not contain the <img>, <object>, <big>, <small>, <sub>, or <sup> elements
- <button> must not contain the <input>, <select>, <textarea>, <label>, <button>, <form>, <fieldset>, <iframe> or <isindex> elements
- <label> must not contain other <label> elements
- <form> must not contain other <form> elements
- Fragment identifiers redundancy. Attributes id and name in some elements have same purpose and because in XML the id attribute is the ID identifier, the name attribute in those elements has been deprecated. These elements are <a>, <applet>, <form>, <frame>, <iframe>, <img> and <map>.
- Interpretation of a pre-defined attribute value sets for some elements (i.e. an input element with the type attribute that has pre-defined sets of values like type="text") is defined to be in lower case because of XML case sensitivity.
- Entity references (like ) have to be in lower case.
Documents are well formed:
Elements & attributes in lower case:
<p> is not the same as <P>
Non-empty elements have closing tags:
<p>No closing!<p>No closing!
<p>Properly closed</p><p>Before 2nd paragraph</p>
Attribute values are quoted:
Attributes minimization not allowed:
Empty elements are closed properly: