file: 04.2.analysisOfWork.txt author: Bob Muller date: February 11, 2014 CS102 Computer Science 102 Spring 2014 Lecture Notes for Meeting 1 of Week 5 Topics: 0. Revisiting the ResizingArray ADTs 1. Searching for an item in an Array ----------------------------------------------------------------- Notes: 0. There have been further questions about how -static- works. Here's a simple example: public class A { // A static function. // public static void greeting1() { System.out.println("Hello 1"); } // The following one is NOT static. // public void greeting2() { System.out.println("Hello 2"); } public static void main(String[] args) { A.greeting1(); // NB: the class name A! A a = new A(); a.greeting2(); // NB: no class name, just a variable 'a' // the value of which is of type A. } } ----------------------------------------------------------------- 0. Revisting the ResizingArray ADTs We'll consider representations of Stacks here but representations of Queues would have served the same purpose. Our old friend, the fixed-size sequential Stack had a push operation written something like: public void push(T info) { this.a[this.N] = info; this.N = this.N + 1; } So it's fair and reasonable for a client of our ADT to ask, if I use the push operation how much -work- will be done? By work, we mean primarily how long will it take, but we are also interested in how much storage will be required. In the case, above, in addition to the call of the push function, two assignment statements must be executed. We can safely ascribe a unit time to each task and report that the push operations requires 3 steps. Resizing Arrays Of course the fixed stacks could overflow, so we provide a more advanced implementation that resizes the array when push finds a the stack full: // resize the underlying array holding the elements // 01. private void resize(int capacity) { 02. T[] temp = (T[]) new Object[capacity]; 03. for (int i = 0; i < N; i++) { 04. temp[i] = this.a[i]; 05. } 06. this.a = temp; 07. } 08. public void push(T info) { // double the size of the array if necessary. // 09. if (this.N == a.length) resize(2 * a.length); 10. this.a[this.N] = item; // add item to stack. 11. this.N = this.N + 1; 12. } Now in thinking about how to answer the "how much work question" we see that the answer depends. If a call of push happens to occur when the stack isn't full, the answer is that it will be FOUR units of time (we had to add one unit for the comparison done in line 09). But if, on the other hand, the push happened to occur when the stack was full, then the answer -depends on- N, the number of elements in the stack. Counting up the steps for a full-stack push, we have: - 1 step for the call of push, - 2 steps in line 09, the compare and the call of resize, - 1 step in line 10 and - 1 step in line 11. In resize we have: - 1 step in line 02, - 1 step in line 03, - N steps in line 04 and - 1 step in line 06. So all told, push requires N + 8 steps of work when the stack of size N is full. Two important points: 1. When we say "depends on N" we mean that -it is a function of N-, that is, the amount of work that a piece of code is required to do is given by a function that accepts the size of the input and produces the amount of work as a result. We will use the Greek symbol lambda, when in text we use \, to denote an anonymous function. For example, \N.6 is an anonymous function of one variable N. It returns 6 every time it is called. We might be more familiar with this function introduced with a (useless) name f: f(N) = 6 This gives the uninformative name "f" to the function. We could as well have written: f = \N.6 So when we refer to the amount of work a given piece of code, say -myCode-, carries out, we will write it as: work(myCode) = \N.expression where N is an input variable, the -formal parameter- and where expression is an integer formula quantifying the amount of work required. For example, work(pushFixed) = \N.8 ---- constant work of 8 units work(pushResizing) = \N.N + 8 ---- linear growth of work work(LinearSearch) = \N.N ---- linear work(BinarySearch) = \N.log N ---- logarithmic growth Big O Notation It turns out that we usually don't need to know the -exact- number of steps required, we are primarily interested in the asymptotic behavior of work functions. That is, we are interested in the amount of work they do on large inputs and we will ignore properties of work functions that don't matter when N is large. This idea is captured by the three BIG functions: - O(\N.exp) --- characterizes asymptotic upper bounds, - Omega(\N.exp) --- characterizes asymptotic lower bounds, - Theta(\N.exP) --- characterizes asymptotically tight bounds above AND below. O(g) = { f | exist c, n0 st forall n > n0: f(n) <= c * g(n) } In english, O(g) is the set of functions f that are bounded above by g. This holds for n above a threshold n0 and g(n) is within a constant factor c of f(n). For example, \N.N + 8 in O(\N.N). Why? Choose n0 = 8 and c = 1. There are just a few functions of N that serve as representatives of "categories" of functions that serve as representatives of the categories: \N.C --- requires C units of work, where C is some constant, \N.log_2 N --- requires a logarithmic number of steps in N, \N.N --- requires N units of work, i.e., it is linear, \N.N^2 --- requires a quadratic number of steps, \N.N^3 --- requires a cubic number of steps, \N.2^N --- requires an exponential number of steps, completely impractical for all but small values of N. Simple Search In the accompanying code, we consider the problem of looking for an integer -key- in an array of N keys that are not stored in any order. Looking through sequentially from 0 through N-1, we do linear work. In the case of the largeW.txt whitelist file which has 10^6 entries and the largeT.txt keys file which has 10^7 entries, it took my computer a little over 60 minutes to process all of the keys. In the special case in which the whitelist entries are in sorted order, we can use the more efficient binary search algorithm. Looking through a sorted version of the 10^6 entries in the whitelist file for the same 10^7 keys in the largeT.txt file, my computer completed the task in 23 seconds. Sorting Sorting algorithms are the subject of section 2 of SW. Section 2.1 covers elementary algorithms including insertion sort and selection sort. Also, see David Martin's sorting animations website: http://www.sorting-algorithms.com/ These elementary sorting algorithms have O(\N.N^2) complexity in the average and worst cases. (Although insertion sort is linear in the best case of nearly sorted input!) Section 2.2 of SW covers von Neumann's mergesort algorithm, section 2.3 covers Hoare's quicksort algorithm and section 2.4 covers heapsort (as well as priority queues!) On average, all of mergesort, quicksort and heapsort can sort keys using O(\N.N log N) comparisons of keys. It turns out that quicksort has quadratic complexity in the worst case. So if your application must absolutely run in quickly in all cases, you'll probably want to avoid quicksort. Mergesort Mergesort (and quicksort) is a classic example of a -divide-and-conquer- algorithm. It sorts the unordered input keys by breaking them in half. For example, on input: C A L I F O R N I A, top-down mergesort would proceed as follows: C A L I F O R N I A divide: C A L I F O R N I A C A L I F O R N I A C A L I F O R N I A C A L I F O R N I A merge/conquer: C A L F I O R N A I A C F I L O R A I N A C F I L A I N O R A A C F I I L N O R