Syntax error handling
[This file explains how netrik handles syntax errors in HTML pages, why it does
so, and what to do about that. See index.txt or
index.html for an overview of available
Why does netrik always complain about HTML syntax errors??
No matter what site I load, (almost) always I get error messages. That can't be OK?!
Well, it *is* OK... The problem is: Actually almost all sites *are* more or
Other browsers simply ignore the errors, hoping that the effect will resemble
more or less what the author intended -- and nobody ever knows, but for some
little blemish maybe, where no one cares about. (If Netscape wasn't so tolerant
about syntax errors in the first place, we wouldn't have that problem today
:-() However, the browser can only "guess" what the author intended, and that
doesn't always work out; such errors can cause the page to be layouted
completely wrong, including missing text parts etc. That's why netrik warns
about them, so the user at least knows what the matter is, and also gets the
chance to tell the page author about the problem. (s.b.)
Note that XHTML even *requires* a browser to abort on syntax errors -- sadly,
XHTML is not very popular. (Yet?...)
Of course, it might also happen that netrik sees an error where there is none
-- but no such case is known yet, so that is quite unlike.
Can't I turn that off?
Well, actually, you can: There are the two options "--broken-html" and
"--ignore-broken". "--ignore-broken" will prevent netrik complaining about
*any* syntax errors, while "--broken-html" only turns off warnings about common
errors which in most cases can be guessed correctly and nicely worked around.
(But not always!)
Add "--broken-html" to your ~/.netrikrc file (see
config.* for details) to generally disable
halting on less important errors.
However, please avoid usage of these options if possible. We know that the way
the errors messages are presented now is quite insistent; we can't change that
behaviour as long as multi-windowing is not implemented, though. Thus, using
--broken-html probably can't be helped when using netrik seriously -- sadly
almost all pages on the Net are more or less broken. (But please try to avoid
--ignore-broken -- it's generally a bad idea to use this!)
Still, please take the (little) trouble to tell the page author about the
Reporting a problem
Normally you will find an e-mail address of the right contact at the bottom of
the page, or on some "contact" page. Most authors will be greatful for the
information, and will gladly fix the problem -- once and for all.
You can improve your chances by telling exactly what the problem is. To find
out, you can load the page again, but using the "--debug" option. While parsing
the page, netrik will dump the source in this mode. When an error is
encountered, the last charactacter printed before the error message is the
offending one. Note however that in some cases the real reason for the problem
may be somewhere before.
If the output is too big, you may either use the --dump option and pipe the
output through some pager (but note that the debug output is written to stderr,
so you'll need to redirect it: "netrik --debug --dump <url> 2>&1 |less" or
something the like), or you can use the "--correct-html" option, causing netrik
to abort after the first error is detected.
Alternatively, you can use the W3Cs (WWW Consortium) HTML validating service.
This will create a report with all problems nicely listed -- very helpful for
the page author.
A note on comments
Netrik now has full SGML comment parsing. The problem is that the comment
syntax in SGML is fairly complicated; most authors do not know it, and thus do
not know that putting a "--" string inside a comment normally generates errors.
In most cases these are quite obvious; netrik will print an error message, and
use a workaround which will work in most cases, but not always -- just like for
other typical errors.
However, there are some constructs which *are* valid SGML -- but do not make
what the author probably intended. "<!------>" is a typical example: It is
valid, but the comment doesn't terminate; everything behind it will also be
treated as part of the comment! As it is not an error, netrik can't print an
error message or apply a workaround; instead, just a warning is printed.
Often such an unterminated comment will generate other errors later, because it
interferes with other comments or because it stretches to the end of the file;
in such a case, the warning may help finding the problem. In other cases
however, no error is generated at all -- the warning is the only cue that
something went wrong.