Notes on the Foreign Function Interface (ffi) - 22 June 2002

N.B. The Hugs FFI implementation has changed significantly since the
December 2001 release.

Suppose you have some C functions in test.c and some ffi declarations
for those functions in Test.hs and the code in test.c needs to be
compiled with -lm.  You can use these with Hugs as follows:

  # Use ffihugs to generate Test.c compile it and link it against 
  # test.c and -lm to produce Test.so
  #
  # [If Test.hs depends on other ffi modules, you'll have to compile
  # them first.]
  ffihugs +G +L"test.c" +L"-lm" Test.hs 

  # Run Hugs as normal - when Test.hs is loaded, it will load Test.so
  hugs Test.hs

  # And now try using the imported or exported functions.

This release implements the Haskell foreign function interface
definition release candidate 7:

  http://www.cse.unsw.edu.au/~chak/haskell/ffi/

with a few minor caveats (excruciating details appended at the end).


Enjoy!

--
Alastair Reid                 alastair@reid-consulting-uk.ltd.uk  
Reid Consulting (UK) Limited  http://www.reid-consulting-uk.ltd.uk/alastair/


Known limitations:

o Only the ccall calling convention is supported.  All others are flagged as
  errors.

o foreign export is not implemented

o foreign import wrappers are only implemented for the x86, PowerPC
  and Sparc architectures and has been most thoroughly tested on
  Windows, Linux and using gcc.

  It should be easy to port by any experienced assembly language
  programmer - especially if they first look at
  fptools/ghc/rts/Adjustor.c in the GHC source tree.  The following
  information is intended for those brave souls who try to port the
  implementation to other architectures and can be safely ignored by
  everyone else.

  To make foreign import wrappers work for other architectures, you
  have to modify the function mkThunk in hugs98/src/builtin.c to
  generate a short sequence of machine code (and then send your
  fix to hugs-bugs@haskell.org for inclusion in the next release).

  The goal of the code is (more or less) to implement this C function 

    rty f(ty1 a1, ... tym am) {
      return (*app)(s,a1, ... am);
    }
 
  where rty, ty1, ... tym are C types, app is a "apply" function
  generated by running "ffihugs +G" and "s" is a "stable pointer" to the 
  Haskell being wrapped.  The reason the function is written in
  machine code is:

  o For foreign import wrappers the function has to be generated
    dynamically and neither ANSI C nor any extensions we know of let
    you generate C functions at runtime.  The alternative of 
    invoking the C compiler and loader at runtime is not attractive.

  o The code has to be placed next to a data structure in memory.
    The data structure has this type:
      
      struct thunk_data {
          struct thunk_data* next;
          struct thunk_data* prev;
          HugsStablePtr      stable;
          char               code[16];
      };

    The next and prev pointers are used to implement a doubly-linked list 
    used by the garbage collector to keep track of all wrapped
    functions.

    The stable pointer stores a stable pointer to the Haskell function being
    wrapped.  This is used by the garbage collector.

    The code field stores the machine code.  It is expected that the size
    will have to be changed for other architectures.

  o By writing in assembly/machine code, it is possible to use the
    same code sequence no matter what the function type is.  This
    works because the C calling convention on most machines has the
    stack looking something like this (the stack grows downwards in
    this picture)
    
         |  ...   |
         +--------+
         |  argm  |
         +--------+
            ...  
         +--------+
         |  arg2  |
         +--------+
         |  arg1  |
         +--------+
         |ret_addr|
         +--------+
    
    This calling convention is more or less imposed by the need to 
    support vararg functions in C. 

    To implement the above function, all we need to do is adjust the
    stack to look like this:


         |  ...   |
         +--------+
         |  argm  |
         +--------+
            ...  
         +--------+
         |  arg2  |
         +--------+
         |  arg1  |
         +--------+
         |   s    |
         +--------+
         |ret_addr|
         +--------+
    
    and jump to (tailcall) the start of app.

    On the x86, you can do this with the following code sequence:
    
      pushl (%esp)      ; move the return address "up"
      movl  s,4(%esp)   ; stick the stable pointer "under" it
      jmp   app         ; tail call app

    On the Sparc, alignment restrictions require that we add a
    doubleword.

    On architectures with very different architectures, you can
    (hopefully) get things working by passing the stable pointer in a
    global variable or, perhaps, a callee-saves register and tweaking
    the "app" function (which is generated by implementForeignImportWrapper
    in ffi.c) to expect "s" in that variable instead of on the stack.

  o It is machine code instead of assembly code because we don't want
    to invoke an assembler and linker/loader at runtime.  

    Having determined which assembly code sequence to use, use 
    "as -a" (or equivalent) to view the corresponding machine code and
    then write C code which will insert that code into the code field 
    of a thunk.  

    For the x86, the code looks like this.  

      #if defined(__i386__)
          /* 3 bytes: pushl (%esp) */
          *pc++ = 0xff; *pc++ = 0x34; *pc++ = 0x24;  
      
          /* 8 bytes: movl s,4(%esp) */
          *pc++ = 0xc7; *pc++ = 0x44; *pc++ = 0x24; *pc++ = 0x04; 
          *((HugsStablePtr*)pc)++ = s;
      
          /* 5 bytes: jmp app */
          *pc++ = 0xe9;
          *((int*)pc)++ = (char*)app - ((char*)&(thunk->code[16]));
      #else
          ...
      #endif
           
    This code contains a copy of the stable pointer because it is
    convenient to do this on the x86.  On architectures such as the
    Sparc where 32-bit immediate loads are more painful, it may be
    easier to load the copy of the stable pointer stored in the 
    thunk - this is stored at a fixed offset from the code.
    Likewise, it may be convenient to add a copy of "app" to the
    thunk struct.