Google

netrik hacker's manual
>========================<

[This file contains a description of the hyperlink mechanism. See hacking.txt or hacking.html for an overview of the manual.]

In contrast to most other aspects of functionality, there is no seperate module for link handling. The components forming the link mechanism are spread over several other modules instead.

There is a links.c file (and a related links.h, of course), but it contains only some simple helper functions -- at least for now.

parse_struct()

The first step is done already in the layout module: The links need to be extracted while parsing the structure of the page to generate the item tree. This is described under Links in hacking-layout.*.

Link Struct

The start and end position of a link, and the link's destination (extracted from the "href" attribute) are stored in "string->link[]".

This is a list of "struct Link", containing the data of all links and form controls inside the current text item:

  • The ints "start" and "end" store the start and end postitions of the link inside the string
  • The page coordinates of the link are stored in "x_start", "x_end", "y_start" and "y_end"
  • The "value" string contains the link's destination URL for normal links, or the form data for form controls.
  • "form" is an "enum Form_control", telling to what kind of link or form control the link structure refers
  • "name" is a string storing the value of the "name" or "id" attribute identifying a form control
  • The "enabled" flag indicates whether a form element is to be submitted to the server; for conditional elements (checkboxes etc.) this depends on the current state

Link List

To allow easy access to all links on the page without having to scan every text item, another structure is created. This structure doesn't itself store any data; it only stores pointers to the entries inside the text items' link lists.

The "Link_list" structure (defined in items.h) contains:

  • The number of strings on the page, and
  • An array of "struct Link_ptr", containing for every link:
    • The item containing the link
    • The number of the link inside the item's link list

This structure is created by make_link_list() (just after parse_struct()). This function traverses all items, and for every encountered text item stores pointers to all of its links. The only exception are links representing "hidden" form input fields: These aren't stored in the link list, so they won't be selectable.

pre_render()

When assigning coordinates to all items in the various sub-functions of pre_render() (as another part of the layouting process), the page coordinates of the links also have to be calculated. This is done in assign_ywidth(), and described under Link Coordinates in hacking-layout.*.

display()

After the links are extracted during layouting, they can be activated and followed in the pager. (Which is described in hacking-pager.*)

Activating Links

Links can be activated in the pager using the link selection commands -- presently these are only the "J" and "K" commands.

How these commands decide which link to activate, is described under Link Selection Commands in hacking-pager.*.

activate_link()

Actually activating (or deactivating) a selected link is done with the activate_link() function. This function is explicitely called by the link selection commands, but can be also called by scroll_to() (see hacking-pager.*) automatically, if the active link is scrolled out of the valid screen area.

As a special case, the current link is deactivated by calling activate_link with -1 as argument, so no link is active afterwards.

Before activating the requested link, the page is scrolled so that the link is actually in the valid screen area. (This isn't done when called with -1, as links can be deactivated even if they are no longer on the screen.) Scrolling is done using scroll_to().

Next step is deactivating a previously active link, if any. This is done by recursively calling active_link() (with -1 as argument). Of course this is also done only when activating a new link -- if we are only to deactivate the current active one, we won't call ourself to do so...

Now the affected link is highlighted (or unhighlighted, if it is to be deactivated) using the highligh_link() function from links.c. This function takes the page descriptor (for the item tree and the link list), the link number, and a flag indicating whether to highlight or unhighlight as arguments. The higlighting is done by finding the first div of the associated string that belongs to the link text, and changing the background color of it (and all other divs before the link end). This is done by or-ing the color with "color_map[TM_ACTIVE]"; deactivating is done by clearing the bits.

Afterwards, the screen area affected by the link activation is re-rendered.

The coordinates of the affected page area normally are determined by the coordinates assigned to the link in pre_render(); if the link spans multiple lines (contains line wraps) however, the x-coordinates of the item containing the link are taken instead, so all lines containing parts or the link are repainted as a whole. (It would be too complicated to determine the exact affected line parts, and it doesn't make much of a difference anyhow.)

The coordinates are truncated to the part visible on the screen, and the area is repainted.

Finally, "active_link" is set to the newly activated link. (Which is -1, if the link was to be deactivated.)

Following Links

If some link is active, it can be followed in the pager by pressing <return>. The pager immediatly exits in this case, with a return value indicating that a link was followed; everything else has to be done by the caller. (s.b.)

main()

Following the links itself is done in main(). When it sees that a link was followed (the pager returned "RET_LINK"), first the "active_link" URL has to be extracted.

get_link() retrieves the link structure of the desired link (by the page's link list) inside the item tree, and returns a pointer to it, so all link data can be accessed.

The URL stored in the link structure ist copied to the local "url".

Having this, the old page is not needed any longer, and is freed using free_layout(); afterwards, the new page is loaded using load_page() with the current page's URL as base.

If the link only references a local anchor (the URL starts with '#'), the old page isn't freed; instead, its descriptor is passed as the "reference" parameter to load_page(), so the page data will be reused and only the anchor activated, instead of reloading the page.

init_load()

The full URL of the new page to be loaded needs to be determined using the link URL and the current page's URL. That is done in init_load() (called from load_page()). This process is described in detail in hacking-load.*.

Forms

HTML forms are handled together with links. The "form" element of the Link Struct indicates that the link list entry refers to some form control, not to a normal link.

forms.c

forms.c contains some helper functions for handling forms.

set_form()

set_form() writes the value of a form control (given by its link structure as argument) to a string in the item tree, so it will be displayed on the output page. It is called directly to set the initial value just after the form element was extracted during structure parsing, when the link list doesn't exist yet.

(It is presently also used instead of update_form() in one place -- which is a hack; see below under Manipulating.)

First it has to find out where to store the data. For this, the string containing the element is searched for the first div *after* the link start. (The first div of a form link, starting at the link start, is the form indicator, e.g. '['; the following div then contains the value.)

Having this, the value can be stored to the string. How this is done depends on the type of the form control.

For text, password, and hidden input fields, as much of the "value" as the div length is copied to the string div. If there is less than the div length, the remaining space is padded with '_' characters. (Hidden input fields are displayed like text fields, only they are "dim", and can't be selected.)

For radio buttons and checkboxes, either an '*' is put in, or an nbsp. This is determined by the "enabled" flag of the form link. (It does *not* depend on the "value"!)

For <select> options, the character introducing the option (the one *before* the first char of the option text) is set either to '+' or to '-', also depending on whether "enabled" is set.

update_form()

update_form() is very similar; the difference is that it takes the whole page structure and a link number from the link list as arguments; the string and the link structure are then extracted from the list, and set_form() is called to actually set the data.

get_form_item()

get_form_item() is used to find out to which form a form control belongs. (This is necessary for submit buttons.) It takes the link number of the form control, retrieves the right form item in the item tree, and returns a pointer to it.

The form item is found by starting with the item containing the form control (determined by the Link List), and going up in the item tree until an item of type ITEM_FORM is found -- it contains the item with the button, so it belongs to the form in which the button is located.

form_next()

form_next() (together with form_start()) is used to get all form elements of some form from the item tree. (This is necessary when submitting, or when activating "radio" input elements.) Depending on the "filter" flag, it returns either all form elements, or only the ones that are "successful" (have a name, a value, and are enabled), and should be submitted to the server.

Every time it is called, it returns one link pointer, belonging to a form control. When there are no more elements in the form NULL is returned.

form_next() takes/returns a "struct Form_handle" as parameter, which stores the form item, the item of the last returned link, the link number of the last returned link, and the "filter" flag. This is used to keep track of which form elements have already been returned in previous calls. The handle has to be initialized by form_start(), which takes the form item and the "filter" flag as paramters, and returns the handle. (Not a pointer!)

form_next() traverses all items below the form item, using the "list_next" pointers. In every text item found, it scans all of its links. Both loops work directly with the handle values as loop variables. They are not initialized on entering, so that they continue right where the last call to form_next() stopped.

To ensure all items inside the form are scanned, form_start() has to find the first item inside the form, so form_next() will start scanning there. This is done by repeatedly descending to "first_child", until a childless item is found -- this is always the first one in the form.

+------+                                                                       start -->    +------+
| text |-.                                                                               ,->| form |
+------+=|===============================================================================|=>+------+
         |                                                                               |    x  ^
         |                                                  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  +
         |                                                  x ++++++++++++++++++++++++++++++++++++
         |                                                  v +                     +    |
         |                                                +-----+                +-----+ |
         |                             1. descend -->  ,->| box |-.           ,->| box |-'
         |                                             |  +-----+=|===========|=>+-----+==>NULL
         |                                             |    x ^   |           |    x ^
         |    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +   |    xxxxxxxxxxxxx +
         |    x +++++++++++++++++++++++++++++++++++++++++++++++   |    x +++++++++++++
         |    v +          +                      +    |          |    v +    |
         |  +------+   +------+                +-----+ |          |  +------+ |
         `->| text |-->| text |-.           ,->| box |-'          `->| text |-'
   ,-->     +------+==>+------+=|===========|=>+-----+==============>+------+==>NULL
   |          x           x     |           |    x ^                    x
2. descend    v           v     |    xxxxxxxxxxxxx +                    v
  (goal)     NULL        NULL   |    x +++++++++++++                   NULL
                                |    v +    |
                                |  +------+ |
                                `->| text |-'
                                   +------+==>NULL

(See Structure Tree in hacking-layout.* for a description of the item tree.)

url_encode()

url_encode() is responsible for creating the string that will be submitted to the server as part of the URL when a form is posted using the "GET" method, or in the HTTP request body when using the "POST" method with URL encoding.

This function retrieves all (successful) form controls from the given form, using form_start()/form_next(). For each form control, the name and the value are stored, separated by an '=' and followed by an '&'. In both name and value, certain characters have to be escaped, which is done in the encode_string() helper function.

Spaces are replaced by a '+'. Special characters, including all non-ASCII-characters as well as a "real" '+' and some other characters with special meaning inside the URL ('&', '=', '%' and '#') are replaced by a hexadecimal representation, introduced with a '%' char. ("%HH") Other characters are stored directly.

The memory for the created string is allocated in chunks, for efficency reasons. This is done inside encode_string(). As soon as the currently allocated size of the string doesn't suffice to hold the next character, it is grown by one chunk. The test is very crude, to keep it simple: The array is considered too small if there is space for less then 5 characters, as up to 5 characters may be stored before the next test, in the worst case: Up to 3 for the currently processed char (if it has to be hex-escaped), plus the following '=' if we are at the last character in the name, plus the '&' if the value for this form control is empty. (No checking is done before storing the '=' and '&', so we have to reserve the place for them here!)

mime_encode()

mime_encode() does the same as url_encode, except that the data is encoded using multipart MIME format.

This is much simpler: There is no encoding necessary to the data itself; only the mime header information have to be stored along with the data itself.

As the size of a data block can be easily calculated here, the string is first resized to fit the new block in each iteration (for each name/value pair), and then the data is stored with a simple sprintf() call. This should be much more efficient than the approach in url_encode for bigger data blocks, especially files. (Though these aren't implemented yet...)

The size is estimated by adding the size of the data strings ("name" and "value") to the size of the format string -- this is not exact (the format specifiers ("%s") are counted, though they are replaced in the resulting string; on the other hand, the trailing '\0' isn't counted explicitely), but the few bytes too much shouldn't matter. (Especially as they will be used up later anyways...)

Preparing

The form controls are extracted in parse_struct() (described in hacking-layout.*), just like normal links. The only difference (besides of storing the "value" and "enabled" variables) is that the initial value/state of a form control has to be displayed on the page, which is done by set_form().

Manipulating

When <return> is pressed on a selected form control (display() returns RET_LINK, and "form" of "active_link" is not FORM_NO), special action is taken in main().

On a form control, this generally involves getting an new value (either implicit by the link activation, or by explicit input), which is then saved either in "link->value" or in "link->enabled" (depending on the control type), and also stored to the item tree with update_form(). (So the new value will show up on the output page.)

For text and password input fields, the user is prompted for a new value, which is stored in "link->value".

For checkboxes, simply the "link->enabled" flag is toggled.

For radio buttons, "link->enabled" is set for the activated button, and reset for all other radio buttons in the form which have the same "name". This is done by iterating through all form items (with form_start()/form_next() without "filter"), and every time a control with the same name is found, disabling it. Reflecting the disabling in the item tree can't be done with update_form() however, as the link number in the link list isn't known here. (It's not returned nor used by form_next().) Thus, we grab the current string item from the handle used by form_next(), and call set_form() with that. This is a hack (it depends on the implementation of "struct Form_handle", which it shouldn't); however, I don't know any simple and reasonable way to implement that with the current link storing mechanism...

For <select> options, the behaviour depends on whether the select has the "multiple" attribute. (Which is coded in the link type: Either "FORM_OPTION" or "FORM_MULTIOPTION") If yes, they behave like checkboxes, otherwise like radio buttons.

On a submit button, a page load is performed (using load_page() as usual); the target URL is taken from the "data.form->url" field of the item representing the form to which the button belongs, retrieved with get_form_item(). The "form" parameter is set to the form item itself; load_page() passes this on to init_load(), where it is handled appropriately. (See hacking-load.*)

Submitting

When init_load() is called with a "form" argument, the form data is extracted and encoded, and passed to the HTTP server.

For forms using the "GET" submit method, this is done in init_load() itself. The data is extracted and encoded using url_encode(); the resulting CGI parameter string is then passed to merge_urls(), where it is stored as part of the resulting target URL (in place of any other CGI paramters). As the form data is now part of the URL, no other special handling is necessary -- it will be passed to the server with the URL.

For "POST" forms, the form item is simply passed on to http_init_load() and from there to get_http_cmd(), where either url_encode() or mime_encode() is used to get the encoded form data, which is then stored in the body of the HTTP request and passed to the server.

Anchors

If the loaded link contains a fragment identifier, "active_anchor" is set to the desired anchor after loading the page. This is done in load_page() (described in hacking-layout.*), after all other steps of the loading process are completed.

Similar to links, anchors are extracted while parsing the page structure. However, they are stored in another way: Every anchor has an own item, either of type ITEM_BLOCK_ANCHOR or ITEM_INLINE_ANCHOR, depending on the anchor type. Both types are Virtual Items; this is described in hacking-layout.*.

After extracting the anchors, the "anchor_list" data structure is created (using make_anchor_list()), which is very similar to the "link_list". It consists only of the total anchor count, and an array containing pointers to every anchor item.

Jumping to an anchor primarily implies retrieving the matching anchor number. This is done (inside load_page()) by comparing the fragment identifier given in the URL with the names of all anchors from the anchor list. When the right anchor is found, its entry number in the anchor list is stored in "page->active_anchor".

Displaying the anchor itself is done in the pager. When display() (described in hacking-pager.*) finds an "anctive_anchor" while starting up, it calls activate_anchor() to jump to and show the anchor position.

activate_anchor()

This function first calculates the position to scroll the page to. This position depends on the setting of the "anchor_offset" config variable: If this has a nonzero value, the page is scrolled so that the anchor will start at the reciprocal value of "anchor_offset" of the screen height; when "anchor_offset" is five for example (current default), we will scroll to the anchor start position minus one fifth of LINES -- the anchor start will appear one fifth of the screen height below the screen top. If "anchor_offset" is zero, the anchor will be shown "link_margin" lines below the screen top instead.

With this calculated "optimal" scroll position scroll_to() is called, and returns the actual pager position -- this may differ from the requested when the anchor is near the screen top or bottom. Having this, the actual screen positions of the anchor start and end are calculated. If these are identical (empty block anchor), the end position (pointing, as always, *after* the last anchor line), is incremented by one to cause a mark being displayed in the next line. The theoretical start and end positions are then truncated to the screen boundaries.

Finally, a mark is printed in the rightmost column of every screen line in the area spanned by the link, as calculated before. These marks are printed directly to the curses screen; they aren't stored in the item tree.

When activate_anchor() is called with -1 as anchor number, the previously activated anchor is deactivated. The page isn't scrolled in this case, but the screen position of the anchor is calculated as well; the marks are cleared by re-rendering the area where the marks were drawn. (Calling render() (described in hacking-layout.*) with the "overpaint" flag.)

The deactivating is done by the pager after the first keypress. (*After* the associated function was performed.)