As web developers we write a lot of code, we also take the time to properly select the most semantically appropriate HTML elements for our content. When writing tutorials and articles with code, why don’t we also take the time to properly mark up our code samples in HTML?
Most developers take the route of using a Javascript syntax highlighter or using a server side solution, like Pygments. These aren’t bad solutions, in fact they are pretty great, but they just use a bunch of <span> elements with classes to syntax highlight. We can do better, semantically.
The Elements
We still need to use the trusty <pre><code> as used by the above mentioned scripts. But for the code inside, we can do so much better than meaningless <span>s.
Looking at the HTML specification we have a bunch of elements that make sense within our code blocks:
<var>: variables<b>: keywords; like built in functions and properties<i>: taxonomy, technical term, designation; like properties, methods, attributes<small>: fine print or comments<del>: code that is removed<ins>: code that is added<mark>: important, highlighted code<samp>: computer output<kbd>: user input
And we still use the <span>, but only for things that have no meaning in the context, like a regular old string.
Some Examples
Lets walk through some samples to see how to use these elements in our code samples.
HTML
What you want to see:
(view source to see it in practice)
<div class="group">
<h1>Sample HTML</h1>
<p>Some super awesome content!</p>
</div><!-- end group --> What HTML you would write:
<<b>div</b> <i>class</i>=<span>"group"</span>>
<<b>h1</b>>Sample HTML</<b>h1</b>>
<<b>p</b>>Some super awesome content!</<b>p</b>>
</<b>div</b>><small><!-- end group --></small> So, from a semantics point of view, I believe the above code is more appropriate. The notable additions are:
<b>for wrapping around the HTML elements; semantically the elements are keywords, in their context.<i>for wrapping attributes on the HTML elements; I suppose that attributes are a taxonomies or designations.<span>for wrapping around strings in code, since the string really doesn’t have any meaning.<small>for wrapping around comments.
CSS
What you want to see:
(view source to see it in practice)
/* Sectioning Elements */
.group {
background-image: url("pattern.png");
border: 1px solid #000;
} What HTML you would write:
<small>/* Sectioning Elements */</small>
<i>.group</i> {
<b>background-image</b>: url(<span>"pattern.png"</span>);
<b>border</b>: 1px <b>solid</b> #000;
} The same sorta rules apply for CSS, <b> for keywords, aka properties; <i> for selectors; <span> for strings; and, of course, <small> for comments.
Javascript
Let’s look at something a little more complex for Javascript, including additions and deletions and important parts.
What you want to see:
(view source to see it in practice)
document.getElementById('foo').addEventListener('click', function(evt)
{
var myMessage = 'Hello World!';
alert(myMessage)
console.log(myMessage)
}, false) What HTML you would write:
<b>document</b>.<i>getElementById</i>(<span>'foo'</span>).<i>addEventListener</i>(<span>'click'</span>, <b>function</b>(<var>evt</var>)
{
<b>var</b> <var>myMessage</var> = <span>'Hello World!'</span>;
<del><b>alert</b>(<var>myMessage</var>)</del>
<ins><mark><b>console</b>.<i>log</i>(<var>myMessage</var>)</mark></ins>
}, <b>false</b>) A few notable additions are the use of <var> for wrapping around variables and arguments. The example uses <b> around Javascript keywords; for document and console, it may be more technically correct to wrap them in <var> elements, but I like to reserve <var> for variables and arguments I have created.
When marking up objects and their properties, I feel that it’s more appropriate to mark up the parent as a <b> or <var> and all the child properties/functions as <i> elements, as in document.getElementById().
Since the above code example is kinda like a tutorial, I decided to show that <del> and <ins> can be used to demonstrate code that should be deleted and code that should be inserted.
Using the <mark> element we can highlight as passage of code that is important to the current step in the tutorial.
PHP
What about going even further by integrating documentation with our code samples by linking to further resources.
What you want to see:
(view source to see it in practice)
<?php
$db = new PDO('mysql:dbname=mydb;host=localhost', 'root', '');
$db->exec('DELETE FROM mytable'); What HTML you would write:
<small><?php<small>
<var>$db</var> = <b>new</b> <a href="http://php.net/pdo"><b>PDO</b></a>(<span>'mysql:dbname=mydb;host=localhost'</span>, <span>'root'</span>, <span>''</span>);
<var>$db</var>-><a href="http://php.net/pdo.exec"><i>exec</i></a>(<span>'DELETE FROM mytable'</span>); Since the above sample is teaching about PDO and exec(), why not link those functions to the PHP documentation, then readers can jump into deeper material at their discretion.
Terminal
I’m not going to cover using <samp> and <kbd> because the HTML spec has some great examples.
Defining the Language
There is a great question on Stack Overflow about what is the most semantically appropriate way to define the programming language for the code. I’m a fan of using @class, like so:
<pre><code class="language-html">…</code></pre> Why Bother?
Well, I recognize many developers are lazy. And I recognize that the automated solutions have extra benefits such as copy-friendly versions and line numbers, etc. But code seems to be one of those places in online content and documentation where markup can be improved. It seems that web developers spend time choosing appropriate elements for general content, then just dump the code onto the page. True craftsmen would sweat all the details.
Am I over-the-top semantic crazy? Likely. But semantics and code are what we do, lets spend a little more time on our documentation samples.