HTML Metadata

Metadata in HTML and XHTML is specified rather differently. For example, consider the recommend ways to specify the author of a HTML page:

Link Relations: rel and rev
The rel/rev link relation can be used in both &lt;link&gt; and &lt;a&gt; elements. Link relations are used if you want to point to another resource (e.g. HTML page or RDF resource). In particular, the rel attribute is used to specify what the target URL is of the current document.

The following keywords are defined in HTML 4 and XHTML 2:


 * alternate:Designates alternate versions for the document. When used together with the hreflang attribute, it implies a translated version of the document. When used together with the hrefmedia attribute, it indicates a version intended for that type of device.
 * stylesheet:Refers to an external style sheet. (Deprecated in XHTML 2.)
 * start:Refers to the first resource in a collection of resources. A typical use case might be a collection of chapters in a book.
 * next:Refers to the next resource (after the current one) in an ordered collection.
 * prev:Refers to the previous resource (before the current one) in an ordered collection.
 * up:Refers to the resource "above" in a hierarchically structured set. (New in XHTML 2.)
 * contents:Refers to a resource serving as a table of contents.
 * index:Refers to a resource providing an index.
 * glossary:Refers to a resource providing a glossary of terms.
 * copyright:Refers to a copyright statement for the resource.
 * chapter:Refers to a resource serving as a chapter in a collection.
 * section:Refers to a resource serving as a section in a collection.
 * subsection:Refers to a resource serving as a subsection in a collection.
 * appendix:Refers to a resource serving as an appendix in a collection.
 * help:Refers to a resource offering help (more information, links to other sources of information, etc.)
 * bookmark:Refers to a bookmark. A bookmark is a link to a key entry point within an extended document. The title attribute may be used, for example, to label the bookmark. Note that several bookmarks may be defined for a document.
 * meta:Refers to a resource that provides metadata, for instance in RDF. (New in XHTML 2.)
 * icon:Refers to a resource that represents an icon, similar to the favicon.ico file. (New in XHTML 2.)
 * shortcut icon:See icon (custom element by Internet Explorer)
 * p3pv1:Refers to a P3P Policy Reference File. (New in XHTML 2.)

In addition XHTML2 defines the profile, role and cite keywords, but usage is not entirely clear. The list at http://www.w3.org/TR/relations.html seems to specify an old list of keywords. It is not recommended to use those.

Stylesheets
As seen above, the valid way to specify a stylesheet in HTML 4 and XHMTL 1 is:



In XHTML2, the stylesheet keyword is deprecated in favour of the style element (HTML4 and XHTML1 already contain the style element, but not the src attribute):



Finally, it is also possible to specify the stylesheet in the XML preamble, though not all browsers support this. This is the recommended way for specifying style sheets in SVG images, though:



If the type is not given, HTML uses text/css by default, or whatever is given in the Content-Style-Type HTTP header.

Meta tags
The meta</tt> element can have a property</tt> (in XHTML2) or name</tt> (in HTML4 and XHTML1) attribute, with a specific keyword. The following keywords are specified:
 * description:Gives a description of the resource.
 * generator:Identifies the software used to generate the resource.
 * keywords:Gives a comma-separated list of keywords describing the resource.
 * robots:Gives advisory information intended for automated web-crawling software.
 * title:Specifies a title for the resource.
 * author:Specifies the creator of the HTML page (deprecated in XHTML2)
 * copyright:Gives the copyright statement (deprecated in XHTML2)

HTML4 did not formally specify any keywords, but the appendix mentions the keywords, description and robots keywords, while the examples mention the author and copyright keywords, which are deprecated in XHTML2.

XHTML2 also defines reference as the default keyword, if none is present. This is only useful is the property attribute is used on other elements then the meta element. See http://www.w3.org/TR/xhtml2/mod-meta.html and http://www.w3.org/TR/xhtml2/mod-metaAttributes.html.

HTTP equivalent data
It is sometimes not possible to (easily) alter the HTTP headers. In those cases, it is possible to specify a substitute HTTP header using the meta</tt> element:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />

The following HTTP headers describe, part of


 * Content-Type:The MIME Type of the body, with optional charset. For example "text/html; charset=ISO-8859-1"</tt> for HTML or "application/xhtml+xml; charset=UTF-8"</tt> for XHTML. Regretably, Internet Explorer does not understand the application/xhtml+xml MIME type.
 * Content-Language:Describes the natural language(s) of the intended audience (thus not necessarily a list of all languages used in the document).
 * Content-Length:Size of the full HTTP body (thus all the HTML code), in bytes.
 * Content-Location:The URI of the original resource, in case it can be accessed at seperate locations.
 * Content-MD5:Message integrity check (MIC) of the entity-body, using a MD5 checksum.
 * Expires:The Expires entity-header field gives the date/time after which the response is considered stale. Unfortunately, the required format is the rather clumsy RFC 1123 date format (e.g. Thu, 01 Dec 2010 16:00:00 GMT</tt>)
 * Last-Modified:Specifies the last modification date of the document. Specified in archaic RFC 1123 format.
 * Content-Style-Type: The default MIME type for scripts. By default text/css</tt> (Defined by HTML 4.)
 * Content-Script-Type: The default MIME type for scripts. By default text/javascript</tt> (Defined by HTML 4.)
 * Cache-Control:Specifies how end-hosts and intermediate proxies must cache the results. E.g. max-age=3600</tt>
 * Pragma:Obsolete header, defined for backwards-compatibility with HTTP 1.0. Pragma: no-cache</tt> has the same meaning as Cache-Control: no-cache</tt>.
 * PICS-Label:Obsolete header, defining the content rating of a document. The Internet Content Rating Association (ICRA) has now replace PICS with ICRA labels, which use RDF files. You need to use  <meta name="meta" content="icra-label.rdf" type="application/xml" /> </tt> for these new labels.

In addition, RFC 2616 (HTTP 1.1) defines the Allow, Content-Encoding, and Content-Range entitity-headers, but these do not seem useful in a HTML meta</tt> element.

External Metadata Specifications
All HTML variants allow an extension of keywords using external namespaces. The most populair namespace are according to a survey by Google are the Dublin Core and XFN.

Specifying the external namespace
In HTML 4 and XHTML1 (as recommended by the HTML 4 spec) and the Dublin Core spec):

<head profile="http://dublincore.org/documents/dcq-html/"> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <meta name="DC.creator" contents="John Doe" />

While the Dublin Core extends the keywords for the meta</tt> element, XFN extends the keywords for the rel</tt> attribute of the a</tt> and <tt>link</tt> element. In addition, while the Dublin Core is used for elements in the <tt>head</tt> element, XFN is typically only for <tt>rel</tt> atributes of <tt>a</tt> elements in the <tt>body</tt> of the HTML page (so on <tt>a</tt> elements rather then link elements). Even so, for HTML 4, the profile attribute should be added to the <tt>head</tt> element, not to the <tt>body</tt> element:

<head profile="http://gmpg.org/xfn/11"> <a href="johndoe.example.com" rel="co-worker">John Doe</a>

The <tt>profile</tt> attributes allows multiple values, seperated with a space. However, the HTML 4 specification says that all values but the first URI may be ignored.

In XHTML 2 (as shown in the XHTML specs):

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/elements/1.1/"> <meta property="dc:creator" contents="John Doe" />

Alternatively, you can still use a profile, though this is specified with a link element, rather then in the head element. Since XHMTL 2 is still in progress as of this writing, I expect that only one method will remain in the end:

<html xmlns="http://www.w3.org/1999/xhtml"> <link rel="profile" content="http://purl.org/dc/elements/1.1/" /> <meta property="creator" contents="John Doe" />

If you use element refinements of the Dublin Core, like <tt>date.created</tt>, rather then just <tt>date</tt>, it is not obvious how to specify this in XHTML, since the <tt>date</tt> element is defined the one namespace, while the refinement <tt>created</tt> is defined in another namespace. There are in fact two equivalent ways to define it, as shown by these two <tt>meta</tt> elements.

<head profile="http://dublincore.org/documents/dcq-html/"> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /> <meta name="DC.date.created" content="2001-07-18" /> <meta name="DCTERMS.created" content="2001-07-18" />

See the articles on Dublin Core and RDF schemas for more information about other terminologies.