file:   04.2.analysisOfWork.txt
author: Bob Muller
date:   February 11, 2014

CS102 Computer Science 102
Spring 2014

Lecture Notes for Meeting 1 of Week 5

Topics: 
        0. Revisiting the ResizingArray ADTs
        1. Searching for an item in an Array

-----------------------------------------------------------------
Notes:
  0. There have been further questions about how -static- works.
     Here's a simple example:

     public class A {

       // A static function.
       //
       public static void greeting1() { System.out.println("Hello 1"); }

       // The following one is NOT static.
       // 
       public void greeting2() { System.out.println("Hello 2"); }

       public static void main(String[] args) {
         A.greeting1();    // NB: the class name A!

         A a = new A();
         a.greeting2();    // NB: no class name, just a variable 'a' 
                           //     the value of which is of type A.
       }
     }
-----------------------------------------------------------------
0. Revisting the ResizingArray ADTs

 We'll consider representations of Stacks here but representations of
 Queues would have served the same purpose. Our old friend, the
 fixed-size sequential Stack had a push operation written something
 like:

     public void push(T info) { 
       this.a[this.N] = info;
       this.N = this.N + 1;
     }

 So it's fair and reasonable for a client of our ADT to ask, if I use
 the push operation how much -work- will be done? By work, we mean
 primarily how long will it take, but we are also interested in how
 much storage will be required.

 In the case, above, in addition to the call of the push function, two
 assignment statements must be executed. We can safely ascribe a unit
 time to each task and report that the push operations requires 3
 steps.

 Resizing Arrays

 Of course the fixed stacks could overflow, so we provide a more
 advanced implementation that resizes the array when push finds a the
 stack full:

       // resize the underlying array holding the elements
       //
   01. private void resize(int capacity) {
   02.   T[] temp = (T[]) new Object[capacity];
   03.   for (int i = 0; i < N; i++) {
   04.     temp[i] = this.a[i];
   05.   }
   06.   this.a = temp;
   07. }

   08. public void push(T info) {

           // double the size of the array if necessary.
           //
   09.     if (this.N == a.length) resize(2 * a.length);

   10.     this.a[this.N] = item;            // add item to stack.
   11.     this.N = this.N + 1;    
   12. }

 Now in thinking about how to answer the "how much work question" we
 see that the answer depends.  If a call of push happens to occur when
 the stack isn't full, the answer is that it will be FOUR units of
 time (we had to add one unit for the comparison done in line 09). But
 if, on the other hand, the push happened to occur when the stack was
 full, then the answer -depends on- N, the number of elements in the
 stack.

 Counting up the steps for a full-stack push, we have:

  - 1 step for the call of push,
  - 2 steps in line 09, the compare and the call of resize,
  - 1 step in line 10 and
  - 1 step in line 11.

 In resize we have:

  - 1 step in line 02,
  - 1 step in line 03,
  - N steps in line 04 and
  - 1 step in line 06.

 So all told, push requires N + 8 steps of work when the stack of size
 N is full.

 Two important points:

  1. When we say "depends on N" we mean that -it is a function of N-,
     that is, the amount of work that a piece of code is required to
     do is given by a function that accepts the size of the input and
     produces the amount of work as a result.

     We will use the Greek symbol lambda, when in text we use \, to
     denote an anonymous function. For example, \N.6 is an anonymous
     function of one variable N. It returns 6 every time it is called.
     We might be more familiar with this function introduced with a
     (useless) name f:

     f(N) = 6

     This gives the uninformative name "f" to the function. We could as
     well have written:

     f = \N.6

     So when we refer to the amount of work a given piece of code, say
     -myCode-, carries out, we will write it as:

     work(myCode) = \N.expression

     where N is an input variable, the -formal parameter- and 
     where expression is an integer formula quantifying the amount
     of work required. For example,

     work(pushFixed)    = \N.8          ---- constant work of 8 units

     work(pushResizing) = \N.N + 8      ---- linear growth of work

     work(LinearSearch) = \N.N          ---- linear

     work(BinarySearch) = \N.log N      ---- logarithmic growth

Big O Notation

  It turns out that we usually don't need to know the -exact- number
  of steps required, we are primarily interested in the asymptotic
  behavior of work functions. That is, we are interested in the amount
  of work they do on large inputs and we will ignore properties of
  work functions that don't matter when N is large. This idea is
  captured by the three BIG functions: 

    - O(\N.exp)      --- characterizes asymptotic upper bounds,

    - Omega(\N.exp)  --- characterizes asymptotic lower bounds,

    - Theta(\N.exP)  --- characterizes asymptotically tight bounds
                         above AND below.

  O(g) = { f | exist c, n0 st forall n > n0: f(n) <= c * g(n) }

  In english, O(g) is the set of functions f that are bounded above by
  g.  This holds for n above a threshold n0 and g(n) is within a
  constant factor c of f(n).

  For example, \N.N + 8 in O(\N.N). Why? Choose n0 = 8 and c = 1.

  There are just a few functions of N that serve as representatives of
  "categories" of functions that serve as representatives of the categories:

  \N.C       --- requires C units of work, where C is some constant,
  \N.log_2 N --- requires a logarithmic number of steps in N,
  \N.N       --- requires N units of work, i.e., it is linear,
  \N.N^2     --- requires a quadratic number of steps,
  \N.N^3     --- requires a cubic number of steps,
  \N.2^N     --- requires an exponential number of steps, completely
                 impractical for all but small values of N.

  Simple Search

  In the accompanying code, we consider the problem of looking for an
  integer -key- in an array of N keys that are not stored in any order.
  Looking through sequentially from 0 through N-1, we do linear work.

  In the case of the largeW.txt whitelist file which has 10^6 entries
  and the largeT.txt keys file which has 10^7 entries, it took my computer
  a little over 60 minutes to process all of the keys.

  In the special case in which the whitelist entries are in sorted order,
  we can use the more efficient binary search algorithm. Looking through
  a sorted version of the 10^6 entries in the whitelist file for the same
  10^7 keys in the largeT.txt file, my computer completed the task in 
  23 seconds.

Sorting

  Sorting algorithms are the subject of section 2 of SW. Section 2.1
  covers elementary algorithms including insertion sort and selection
  sort. 

  Also, see David Martin's sorting animations website:

    http://www.sorting-algorithms.com/

  These elementary sorting algorithms have O(\N.N^2) complexity in the
  average and worst cases. (Although insertion sort is linear in the
  best case of nearly sorted input!)
  
  Section 2.2 of SW covers von Neumann's mergesort algorithm, section
  2.3 covers Hoare's quicksort algorithm and section 2.4 covers heapsort
  (as well as priority queues!)

  On average, all of mergesort, quicksort and heapsort can sort keys using 

    O(\N.N log N)
  
  comparisons of keys. It turns out that quicksort has quadratic complexity
  in the worst case. So if your application must absolutely run in quickly
  in all cases, you'll probably want to avoid quicksort.

  Mergesort

  Mergesort (and quicksort) is a classic example of a -divide-and-conquer- 
  algorithm. It sorts the unordered input keys by breaking them in half.

  For example, on input: C A L I F O R N I A, top-down mergesort would 
  proceed as follows:

                            C A L I F O R N I A
divide:
                    C A L I F                O R N I A

                 C A       L I F          O R       N I A

                C   A     L    I F       O    R    N    I A

                C   A     L   I   F      O    R    N   I   A
merge/conquer:
                C   A     L    F I       O    R    N    A I

                 A C       F I L           O R      A I N

                     A C F I L                A I N O R

                             A A C F I I L N O R