Why Word is Bad for the Web
The Quote Problem
All those foreign letters you see in that text were originally
nothing more than an attempt to make documents a tiny bit nicer
to look at. You see, the design of the keyboard comes from the
age of typewriters, and the symbols present represent the kind
of writing that appears on typewriters. We're stuck with our
keyboard designs, but they were never meant to account for all
the extra letters and characters included in modern fonts. This
led to the quote problem.
What's the quote problem? Well, to answer it, take a look at
your keyboard. Notice how there's only one kind of double-quote
mark - the straight one. Worse, when you want a single quote,
you have to use the same key as for apostrophes! Now, if you
were writing on paper, you'd put different shaped quotes at the
start and end of a quote, instead of just making straight lines.
Altogether, things that would be represented by five different
marks on paper only get two symbols on the keyboard.
Long ago, Microsoft decided to solve this problem. First, they
set up Word to look for quote marks and replace them with nicer,
curly quotes, known as 'smart quotes'. Then, they took some
unused character codes - hey, what could anyone ever want those
for? - and decided that they would represent these new, pretty
quotes.
Everything was fine until, years later, people started copying
text they'd written in Word and pasting onto the web. Because
Microsoft didn't stick to any international standard when they
chose how to represent their smart quotes, the quotes ended up
displaying as all sorts of unintended strange letters in web
browsers. Word's users never meant to do this, but Word had gone
ahead and done it for them, because smart quotes is turned on by
default!
Not so smart after all, was it?
Terrible HTML
Of course, there's more to all this. When Microsoft finally
caught on that the web was going to be big, they quickly added
web features to Word, not least of which is the ability to save
documents to HTML. Unfortunately for the rest of the world,
though, Microsoft again failed to stick to any standards at all.
They made up their own HTML tags to represent the layout of Word
documents, purely to make sure that the documents would look the
same if people wanted to open them in Word and save them in
another format. These proprietary tags now pollute HTML
documents all over the web, simply because the people who
created the pages by saving as HTML in Word don't know enough to
remove them - and they make pages load much more slowly.
Worse, even if you do remove all the Word-specific tags from the
documents, the leftover HTML is still a nightmare. Presumably
Microsoft decided to re-use the HTML generation engine from
FrontPage, with the same kinds of results - a complete and utter
mess.
Smart Tags
Do you think it ends there? Amazingly, it doesn't. For their
latest versions of Word, Microsoft decided it'd be great to add
something they called 'smart tags' - a kind of 'link' that adds
contextual information to things you type. For example, if you
type an address in your document, that address allows you to
link through to a map. Useful? Very rarely.
The problem comes when documents containing smart tags are saved
as HTML - the tags are saved too! This means that documents all
over the web have odd text linked to completely frivolous
places, simply because Word thought it looked like an address.
Not only do these links take ages to load correctly, but they're
ugly too.
What might Microsoft Word unleash upon the web next? We can only
wait in fear.
About the author:
Information supplied and written by Lee Asher of Eclipse Domain
Services
Domain Names, Hosting, Traffic and Email Solutions.