Synook

HTML v XHTML

HTML and XHTML are two markup languages used for describing web pages. They are based on two difference markup standards, the former the older SGML specification, while XHTML is based on XML. When writing up web pages, we (basically) have a choice between these languages – so which one?

On the surface, the languages are much the same, however there are many differences between the two languages, not just in terms is syntax, but also in terms of capacities. For example, XML allows the definition of namespaces, which allow multiple types, or “flavours”, of XML to exist within the one document, for instance not just XHTML but also SGML, etc. XML also has the xml: namespace, for semantic purposes. However, the vast majority of people hardly use, or even have a need for, any of these additional features.

Therefore, in the end, most of us are really just writing HTML with XML syntax, and, indeed, we are even serving it with the text/html MIME type. While this is technically allowed for XHTML, such documents must follow certain guidelines – so much so that we are basically reduced to writing HTML anyway, just with a few extra syntactic restrictions.

Indeed, one of the reasons why people do advocate XHTML is because of its stricter syntax, and this is true; formally, XML’s syntax is more restrictive than that of SGML, which can be written in very strange ways. However, there is nothing to stop us writing HTML in syntactically conservative manner as well – there is nothing in the language that stops us from doing that. The validator won’t pick it up, you say? Well, the validator doesn’t pick up many things, anyway, and one must not let writing in XHTML lure them into a false sense of security when they see that big “valid” sign. In my opinion, the extraneous syntactic additions of XHTML just make understanding the languages more difficult, and ultimately make the process of constructing valid documents more difficult, especially for beginners.

In the end, there are some cases in which the use of XHTML is advantageous, such as for more convenient parsing by other scripts. Nevertheless, in the vast majority of cases, writing in XHTML conveys no significant advantages over HTML, and remains by far the simpler and more compatible option. Even the W3C have given up on future separate versions of XHTML, preferring to focus on SGML in HTML5, and then serialize the language for XML afterwards.