"...Decent surfing value..."
"...A long rant..."
"... If you ignore the spelling and grammatical errors, ... you will find this enlightening..."


D a r k
S i d e
O f
T h e
HTML

There are more things in heaven and earth, Horatio,
Than are dreamt of in your philosophy.
Shakespeare.

this is overlapping bold italic text

Historical Note.

This page was around for so damn long (at least by the Internet Time Standards) without any modifications, so it ended up living in its own time-space continuum without any visible correlation to ours. I felt compelled to do something about it, and after spending countless sleepless nights in thinking how to avoid rewriting of the whole thing, I came up with this historical background note.

Latest developments.

At the end. Mind-shattered news. The Darkest stuff.
[updated 26-December-1996]

Intro

It's a common knowledge that all documents on the WWW should be written in so-called HTML, aka HyperText Markup Language.

Much less is known what to count as HTML.

There's rather vague relationship between HTML (Hypertext Markup Language) as (almost) standartized by Internet Engineering Task Force and whatever is called HTML as it implemented by WEB browsers. To add confusion, there's some levels and revisions(?) of HTML. For example there's HTML-2.0 Level-1, and HTML-3.0. HTML-2.0 supposed to be some sort of the standard most browsers are trying to support.

To make things really obfuscated, should be noted that HTML is a markup language. Markup means that it will mark different elements of your document, but how this document will be seen by WEB wandering individuals is on total behalf of miscellaneous WEB browsers running on a multitude of variuos operating systems.

What's interesting, some of the browsers are rushing to support yet-to-be-defined HTML-3.0, by the way ignoring some basic features of the (almost) standard HTML-2.0. Looks like HTML profanation is getting really profound.

Sudden paroxysm of critical paranoia stroke me and I wrote this page.

Tags

Tags are the base of HTML. Tags is what differentiates HTML from simple dull dumb boring plain vanilla ASCII text. Sometimes it may seems HTML is just loose collection of various tags. Unfortuantely, HTML is also a language -- in it's own right. Language -- it what is last letter in the HTML stands for. But story so far will be about tags.

Very little attention is paid to what exactly tag is. Common sense says that tag is some word (called tag identifier) surrounded by the angle brackets. For example, <H1> declares beginning of a heading level ONE and law abiding browser should display aforementioned heading in rather big font. Right angle bracket, a.k.a less-than sign is called start-tag open symbol, and left angle bracket, a.k.a greater-than sign is called tag close symbol.

Most of tags should be balanced -- when they are belong to the element with certain context, like

  <H1>This is heading number one</H1>
-- where opening tag is followed by the closing tag. Note, that closing-tag open symbol is less-than sign followed by slash, </. For some HTML elements open or close or even both tags could be omitted. For example paragraph close tag </P> could be omitted.

Diversion

Being a lousy typist, I'm having severe troubles in typing markup. For example to get angle quotes you should keep pressing and releasing SHIFT key while tapping on less-or-greater-than keys. Also typing proper closing tags is quite boring, especially when they are nested.

So I was quite excited when I've found that HTML-2.0 language definition allows Tag minimization : buried deep inside dark mess of HTML-2.0 SGML declaration (don't miss it with DTD - Document Type Definition) was the magick word SHORTTAG in FEATURES section and it was set to YES.

I've rushed to my keyboard and typed this :

        <H1/First minimized HTML tag ever typed by humanity/
Nothing happened.

Netscape (which I'm, as millions of other people, evaluating for 90 days on the fact whether to purchase an ongoing license to the Software or rather not) just ignored this thing as if it weren't there. Mosaic for Windows won't go much further either.

Slightly puzzled whether my knowledge is wrong or browers are screwed, I went to the HTML Validation Service (went - in cyber sense, you know; on the matter I've just made a search on Yahoo) and found that minimization tags are perfectly legal even in the strong arm of the Strict HTML law.

Now I'm entertaining myself by hounding various web browsers with several test pages shown below.

Here they are -- perfectly HTML-compliant and utterly useless, tragically invisible and infernally hostile to any existing HTML rendering device ...

Minimization TAGS

Wait, there's more ...

During HTML validation, I've found some things that contradicted to something I've heard just before. Nothing serious, just another little critical paranoia splash:

Ubiquitous <P> tag

Paragraph element is one of few elements which end-tag symbol could be omitted.
Roaming around varous WEB tutorials and various sorts of wisdom stores I've seen mentions about bad style of having paragraph break after something which implies paragraph break by itself. Like having <P> tag right after </h1>. Ultimate link lead to the HTML spec. page which showed two examples (I've edited them for brevity's sake):

"Bad"

        <h1>What not to do</h1>
        <p>This is like bad or something...

"Good"

        <h1>What to do</h1>
        This is like good <p>or something...<p>

Without much hesitation, I've feed both examples to HTML Validation Service and slammed them against strict HTML-2.0.
As I expected results were exacly in reverse to the name of examples : "Bad" example passed test and "Good" example caused wrath of compiler.

After consulting with HTML-2.0 spec. I've found that Validation Service was 100% right (it would be surprising if it wouldn't). If someone doesn't know - in HTML-2.0 paragraph is non-empty element with mandatory start tag and optional end tag (<P> and </P> respectively). Therefore paragraph element in HTML-2.0 can contain arbitrary number of subelements - lists, text data, etc. In HTML-1.0 paragraph was EMPTY element, which actually represented not a paragraph, but rather paragraph break and had only start tag, <P> -- similar to the line break <BR>. I failed to find HTML-1 DTD, but I think things are pretty close to what I've described.

Note, that wedging paragraph tags within <h1>...</h1> is an error, so don't try to catch me on this.

Another note: non-strict HTML-2.0 is more relaxed, so both examples would be ok.

After testing there was one question left - where's such interesting HTML specification page came from? My guess (and I think I'm right with probability about 0.99) : this was remnants of the HTML-1.0 spec. safely decomposing in some of the dark corners of the W3 consortium. I've failed to find head or TOC of this document. Interestingly enough, link to HTML-1.0 spec. on W3 page was hoplessly broken too.

Minimal HTML document

Looking at different HTML tutorials I've found suprising multitude of opinions what should be considered as minimal HTML document. Id est what is the minimal amount of tags you should put to make your ASCI text look like valid HTML document? In other words, what is this thin metaphysical boundary beyond which plain ASCII text became HYPER? To cut off the fuzziness of the word "valid", I decided valid would be fully conforming to HTML-2.0 (strict) specification (or DTD -- for those who behold). Since I wasn't sure by myself about what is the minimal valid document is, I digged up HTML-2.0 spec and stared at it for a moment.

I've found following amazing (or maybe not) facts:

Summing all of the above minimal document would look like :

    <HTML>
      <HEAD>
         <TITLE>Minimal HTML Document</TITLE>
      </HEAD>      
      <BODY>
      </BODY>
    </HTML>

But...all the elements except TITLE happened to have optional start-tag and end-tag symbols ! So until you not a typing maniac, minimal HTML document would be:

    <TITLE>Minimal HTML Document</TITLE>

If an HTML document is to convey any sort of information, minimal HTML-2.0 strict -conforming document would be (note <P> symbol!) :

    <TITLE>Minimal HTML Document</TITLE>
    <P>Some text without any spark of sense.

Oh, yes, if we'd want to treat our minimal HTML documents as SGML one, document identifier should precede everything, like:

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
    <TITLE>Minimal HTML Document</TITLE>

-- but now we starting to play by the SGML rules and no browser can stand where SGML reigns...

Non-breaking space

This element can be used whenever you want to protect your precious spaces from all-spaces-in-one jamming browser.

Non-breaking space value is 160, symbol is &#160 and code is &nbsp;. Code &nbsp is a part of HTML-2.0, by the way.

Some browsers, like any version of X-Mosaic, ignore both &#160 and &nbsp; or, like Arena -- only &#160, while Mosaic for Windows and Netscape (for everything) can cope with both.

Should be noted, proportional font (default in many browsers) usually have pretty narrow space character, so it is advisable to switch to the fixed font before using non-breaking space.

Here's some example:

   <dl><dd>
   <tt>&nbsp;&nbsp;&nbsp;</tt>Look, 
   here's some paragraph with indent,<br> 
   whoa -- check it out.
   </dl>

will looks like

   Look, here's some paragraph with indent,
whoa -- check it out.

Useful side effect of the non-breaking space is that latter is not considered by the the browsers as space at all, so it could be used whenever you want to protect words from breaking apart.

Moral of the story

Now that I've have enough of the subject and it would be just right time to outline what I've tried to tell and what would be the best approach to cope with HTML:

Credits.

HAIL to folks at WebTechs (formerly HAL) for the pretty useful HTML Validation Service referred throughout this manuscript. It saved me quite a time on running SP manually.

HAIL to James Clark @ jclark.com, creator of the most profound SGML parser so far. One of the previous version of this parser is used in the HTML Validation Service.

The end.

26-December-1996

Beyound the dark side: loads of incredibly odd information about tables in TABLEMAQUIA.


You can send your frustrated comments to me. Take care then.
Disclaimer: This page is not information.