This document aims at giving detailed informations on how Talpo works, it is
not necessary to read it. In all case, you should have read the README first as
it contains the base to start with Talpo.


########## How passes are generated and inserted into GCC ##########

The main Talpo feature is to convert user given tests directly into GCC passes.
This is done by the file Talpo_dispatcher. It inserts quite early a main pass
which has the role to do nothing except creating the new passes. This first pass
is run after cfg because it is an early pass and we are quite sure to run it
whatever optimisation are turned on for the current compilation. It is run only
if there is at least one test to insert, so it might not slow down compilation
time if there is files for which we don't run any Talpo tests.

Before inserting a test, we check that the given numbers of argument correspond
to what the test can handle, if everything is correct we convert it to a GCC
pass inserted after SSA pass. The idea of choosing this pass comes from the fact
that it is quite a standard and well-know pass (see
http://en.wikipedia.org/wiki/Static_single_assignment_form) with particularities
useful for different kind of analyse. One drawback that I discovered later is
that we can get quite easily some variables added by the compiler (I don't speak
of different versions of the original source variables but full generated
variables) such as in the following example:

## C code begin ##

typedef struct _myStr
{
  FILE * ptrfile;
}myStr;

int main (void){
  int i=1;
  myStr * testStr = (myStr *) malloc(sizeof(myStr));
  if(i<1)
    testStr->ptrfile=fopen("toto", "a");
  if(!testStr->ptrfile){
    return 0;
  }
  return 1;
}

## C code ending ##

Corresponding SSA code:

## SSA code begin ##
main ()
{
  struct myStr * testStr;
  int i;
  int D.4086;
  struct FILE * D.4083;
  struct FILE * D.4082;
  const char * restrict D.4081;
  const char * restrict D.4080;

<bb 2>:
  i_2 = 1;
  testStr_3 = malloc (8);
  if (i_2 <= 0)
    goto <bb 3>;
  else
    goto <bb 4>;

<bb 3>:
  D.4080_4 = (const char * restrict) &"a"[0];
  D.4081_5 = (const char * restrict) &"toto"[0];
  D.4082_6 = fopen (D.4081_5, D.4080_4);
  testStr_3->ptrfile = D.4082_6;

<bb 4>:
  D.4083_7 = testStr_3->ptrfile;
  if (D.4083_7 == 0B)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  D.4086_8 = 0;
  goto <bb 7> (<L4>);

<bb 6>:
  D.4086_9 = 1;

  # D.4086_1 = PHI <D.4086_8(5), D.4086_9(6)>
<L4>:
  return D.4086_1;

}

## SSA code ending ##

Here if the goal is to test that the result of a call to fopen is 'immediately'
followed by a test, we have to handle that:
    - we use several variables of different types to transmit the returned
value: 
    * some variables are of type tree_ssa_name (D.4082)
    * some are of type tree_decl (This is not used here for handling fopen
				  return but that would be a variable such as i)
    * some variables are of type tree_component_ref (testStr->ptrfile)

    - We have to take in account that each of this intermediate assignations are
    useful in the context of the test (other way we could not say that fopen is
    'immediately' followed by a test, so it can lead to a quite complex parsing.

We have seen the problem in the context of a 'testNull' testModel, but the
problem is the same for many test models, that why functions that allow to
handle that cases are considered as general and are available to every test
model from the file talpo_init_test.melt.

As some of those variables are recognize by using id (tree_ssa_name) and others
using a string (tree_decl), I have created a structure abstracting their
identifier (class_talpo_test_var_ident) and a function able to compare them
properly (tree_match_same_var). If we need to add a new type of variable with
such comparison it is enough to add 3 matching lines in function
matching_var_uid.


########## Detail of correct /false positive warnings ##########

The goal of Talpo is not to return in every situation the correct diagnostic (a
warning or not).  But to be right in most situation. The idea is to start with a
dummy plugin and to add more precision over the time. At the beginning, it
just checked very basic things and can now handle more cases: using dominance
info, recognizing structure, function pointers... This is quite sure that there
is a lot of unmanaged situations, the goal is to reduce them over the time.

The following showed examples have no goal of completeness but are quite useful
to get an idea of the state of the project. This is short example with the aim
of showing specific behaviour.


## For TestNull/TestZero/TestNeg ##
 
  Let's say we want to run the following test:
  (testNull fopen)
  * The following examples are correctly handle:

####################
int main()
{
  FILE * ptr;
  int i = 1;
  if( i==0 )
    ptr = fopen ("path", "a");
  else
    ptr = fopen ("anotherPath", "a");
  if(ptr)
    ...
}
  This test does not return any warnings (as wanted).
  For handling correctly this case, Talpo use dominance relation betweens
  basicblock: it detects that in each case, the fopen call is the last statement
  of the current basicblock and so, it checks the first statements of their next
  post-dominating basicblock.

####################

int main()
{
  FILE * ptr;
  int i = 1;
  while( i < 10 )
    i++;
    ptr = fopen ("path", "a");
  if(ptr)
    ...
}

This is a quite idiot example in the context of the fopen function (but could be
possible with another one). However, has there is successive calls which are not
tested it returns a warning as expected.
###################


Talpo correctly handle structure and typedef.

typedef struct _myStr
{
  FILE * ptrfile;
}myStr;

int main (void){
  int i=1;
  myStr * testStr = (myStr *) malloc(sizeof(myStr));
  testStr->ptrfile=fopen("toto", "a");
  if(!testStr->ptrfile){
    return 0;
  }
  return 1;
}



###################

  * The following code will produce a false Talpo warning:

int main()
{
  FILE * res = fopen ("path","a")
  int i = 1;
  if(res)
      return 0;
  return 1;
}

  This come from the fact we want the test to follow immediately the call, here
  the assignation of i is considered as another instruction that should have
  been done later. The only way to correct this would be to accept every
  variable assignation (not changing the value of the variable we want to check)
  to be skipped.


###################

  Pointer over function are not detected:
int main()
{
  FILE * (*myPtr)(char *, char *);
  myPtr = fopen;

  FILE * res = myPtr("path","a");
  return 0;
}

Talpo will not emit warning on this code as it does not detect pointer over
functions. To keep correct performance and a simple system, we should improve
gimple_call detection to take in account that it is a function pointer (it will
not be matched currently) and to look at the reference of the pointer. 

###################

  The following case has been asked by Jonathan Wakely on the GCC mailing list:

struct Guard {
    Guard(void* p) : p(p) { if (!p) throw std::bad_alloc(); }
    ~Guard() { grub_free(p); }
    void* p;
};

void func(grub_size_t n)
{
    Guard g(grub_malloc(n));
    // do something with g.p
}
 
In this C++ case, Talpo does not work (it emits a false warning). This looks far
more complicated to handle than just handling function pointer, because, there
is a function context change. As we only analyse one function a time, we should
insert a new pass looking at Guard constructor, remembering that we had
something in function fun. There might also be some interesting things to do
using IPA, however I have to look deeper at this.


## For TestImmediatelyFollowedBy/TestNotImmediatelyFollowedBy ##

This is somewhat related to the testNull/testZero/testNeg case. This use in the
same way post-dominance relations and has the same drawbacks (if there is a
variable between the two searched functions, it will badly warn, it should have
a more complex function pointer handling (as there can be pointer over the two
functions).

  Let's say we want to run the following test:
(testImmediatelyFollowedBy "chroot" 1 "chdir" 1)


  * The following example is correctly handle:

####################
int main()
{
  char * curDir = (char *) malloc (sizeof(char) *3);
  char * notSamePtr = "../";
  curDir = "."
  chroot(curDir);
  chdir(notSamePtr);
  return 0;
}

It returns a warning as expected because chdir does not operate on the same
variable as curDir.

####################


  * The following example is incorrectly handle:


####################
int main()
{
  char * curDir = (char *) malloc (sizeof(char) *3);
  char * samePtr;
  curDir = "."
  samePtr = curDir;
  chroot(curDir);
  chdir(samePtr);
  return 0;
}

It returns a warning as it considers samePtr to be different from curDir while
it points on the same object. This can be solved using the same kind of method
than searching for function pointer, this oblige to look closely to determine
that they have the same reference.
####################


## For TestFollowedBy ##

  This test is quite close to the previous one, except that we don't have to
  check that the two call are successive (so, it is easier to manage). The only
  difficulty was that usually gimple are successively parsed. Here It starts to
  parse them successively but when It meet a call to the first function, It cannot
  continue to parse successively until It finds the call to the next function:
    1) It must not search second function in basicblock which does not
    postdominate the basicblock where Talpo found the first class.
    2) It must handle the fact that we can have another call to the first
    function before finding the second one.

  So Once, a the first call is found, we call a function parsing the end of the
  current basicblock + eventually each post-dominating blocks. Once found (or
  each post-dominators parsed) we continue from the point where the first call
  was found.

########## What should be added to Talpo ##########

As you can see there is still many work do be done on Talpo, out of the false
positives the following could be done:

  * Adding function pointer support.

  * One of the most necessary thing to be done is to use tree declaration
  instead of string when searching for the different functions. This will allows
  to consider a correct C++ support (Talpo has mainly be tested for C language).
  The second point is that instead of comparing strings, we will compare tree,
  which should me far more faster. This might not be very difficult to add, but
  depends mainly on the recently added patch to GCC trunk
  (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00433.html) which add a hook
  FINISH_DECL for plugin, allowing to parse each function declaration. MELT
  should be updated to allow use of this hook. Then we could have an early
  treatment replacing every string by a tree corresponding to the found
  function declaration.

  * Ignoring #DEBUG information in gimple code, so Talpo could work normally
  even when -g is enable.

  * An help field already exist for test model and front-end, an interface for
  the user to look at them should be added. 

  * We could add a test verifying that a function is followed by a call to
  another function or is returned. This would be quite useful in several
  project. For example a malloc'ed variable should be freed in the same
  function or returned.

  * A way to list functions which obey to specific rules, for example that we
  don't want to analyse. 

  * A minimal regular expression support : This would allow not to search a
  single function but a list of function corresponding to the pattern.
