Discussion 6 (07-07)

We are now finished with the section on Java. There, you guys know Java now. Of course, after working with the language for the last couple of weeks, and recently studying for the midterm, you realize how many details it (and any programming language) has. For must of us, there are just way too many to remember. So, what can we do?

Learn by example - Read and write lots of code.

Remember the big ideas - Look up the details.

Use the right language for the job - Know which languages and programming paradigms are good for which tasks.

From this point on, Java is just a tool we use to do other things. Now, onto Data Structures!

Linked Lists are absolutely fundamental in the study of data structures. Besides being one of the most used structures (in any type of programming), Linked Lists also introduce us to the basic components (essentially nodes and pointers) and to techniques used to design, improve, implement, and analyze all sorts of structures.

Linked Lists are fundamentally about indirection, about taking an indirect route from some known starting point to the desired destination. This is an important idea in data structures and in the real world. For example, suppose that you want to find out some information about enrollment. When you call the Registrars office, they might refer you to the Dean's office. The number they give you can be seen as a pointer, as a reference, from your starting point (the Registrars office) to the next point in your indirect route. Now suppose the Dean's office refers you to your Department. The Dean's office serves as a node and the number they give you as another pointer. Finally, suppose that your Department tells you to talk to the Registrar's office. In this case, you have a cycle (a topic we will cover later).

Conceptually, the Linked List variations we will cover should be fairly easy to understand. But to really learn them, to really get familiar with the complexities and details, I recommend that you implement your own Linked List (preferably of the circularly, doubly linked variety, which we will see in a minute).

The singly linked list is the most simple to understand and implement. Below is a sample list, and the code describing the fields needed in each node.

There are three elements in this list. The firt two point to the next element, while the final element points to null (since there isn't a next element to point to). Each element also points to its content, which can be any Object, but happens to be Integers in this case.

The previous version is simple to think about and code, but hard to actually use. The user is required, for example, to create the new nodes and link them to existing nodes. Also, what happens when the list is empty? We now add a wrapper class that implements methods commonly used on linked lists, so the user doesn't have to worry about the details. As part of this, the LinkedList class (the wrapper) keeps a head node, which is never deleted, as a sort of foothold. This style is also better in terms of encapsulation, as the user now never has to deal directly with the nodes of the list.

We now add a previous node pointer to each node. This makes backward or back-and-forth traversal of the list more efficient. As part of this, the wrapper class also gets a tail node.

In the final variation we will look at, we get rid of the tail node and link the last node in the list back to the head (which is also linked to the first node). This structure has the same advantages as the non-circular version, but doesn't require any special cases (insertion, for example, is always the same), or the extra tail node.

The linked list variations serve as a great example of the classic tradeoffs between complexity, efficiency, and elegance. For example, the singly linked list without a wrapper is extremely simple, but it isn't very efficient (we can't go backward and can't quickly gain access to the end of the list) and isn't very elegant (the user has to manage the list themselves). On the other hand, the circularly, doubly linked list with wrapper is more complex, but is also more efficient (we can easily move backward and forward and get to the end of the list quickly) and elegant (most of the work is done for the user and, within the wrapper class, many of the special cases have been eliminated).

Before discussing Iterators, let's introduce a notion of performance/efficiency in the context of Linked Lists. When measuring the cost of some operation on a Linked List, we will count the number of nodes "visited". In general, this measurement is proportional to the time it takes to complete a particular operation.

An Iterator is essentially a (nicely encapsulated) pointer to some position in a Linked List.

Performance
Suppose we want to sequentially access each element in a list. We could call get(index) on the list wrapper class with index from 0 to the length of the list (we will call this "n"). But this will cost us 1 (to visit the first node) + 2 (to visit the first node and then the second node) + ... + n = n^2/2. This is a high cost to pay when we are only looking at n elements.

Iterators can help here. Internally, Iterators keep track of their current position in the list they are associated with. When we ask the Iterator to look at the next element in the list, it need only ask its current node for the next node in the list, instead of having to start at the beginning each time. In this case the cost of iterating through the list is n. As a side not, notice the correspondance in terminology between sequentially accessing the elements in a list, iterating, and a tool that makes doing so easier and more efficient, and iterator.

Common Interface / Abstraction
Many of the data structures in the Java Standard Library provide iterators. These iterators allow us to abstract from the details of each particular structure and work with them using a common interface. As an extension of this, the code used to iterate through some sequence generally takes a particular, familiar form. As a result, it immediately becomes clear, to the person reading the code, what is being done. Below is an example of how Java Iterators are generally used.

As an exercise, let's take a circularly, doubly linked list and see how we can improve its performance.

   ---------------------------------------------------------------
   |  ---   ---   ---   ---   ---   ---   ---   ---   ---   ---  |
   -->|H|<->|5|<->|2|<->|9|<->|5|<->|1|<->|6|<->|0|<->|9|<->|2|<--
      ---   ---   ---   ---   ---   ---   ---   ---   ---   ---   
Index:       0     1     2     3     4     5     6     7     8

1. If a request is made for the node at an index greater than half the list size, access that node by moving backward from the head. For example, if a request is made for the element at index 6, use the prev pointer on the head to go to the node at index 7, and then the node at index 6. In this way, we can gurantee that getting to a node will take, at most, half the list length.

2. Suppose we don't have iterators. We could simulate the benefits they provide by including a "roving"/"current" pointer in our list class. Any time a request is made for the element at a particular index, the list would determine if it is more efficient to move forward from the head, move backward from the head, or move forward or backward from the roving pointer. The roving pointer would also naturally increase the efficiency of iterating through the list to a linear cost operation.

   ---------------------------------------------------------------
   |  ---   ---   ---   ---   ---   ---   ---   ---   ---   ---  |
   -->|H|<->|5|<->|2|<->|9|<->|5|<->|1|<->|6|<->|0|<->|9|<->|2|<--
      ---   ---   ---   ---   ---   ---   ---   ---   ---   ---   
Index:       0     1     2     3     4     5     6     7     8    
                        /|\                                       
                         | - "Current"

3. Why not extend this idea? What if we included several "roving" pointers? What if we had a whole set of them that pointed into the list at fixed intervals (let's say every 3rd node)? How would we manage all of these extra pointers in a variable sized list? Well, why not be recursive about it and use a linked list to keep track of them?

         ---------------------------------------------            
         |  ---               ---               ---  |            
         -->| |<------------->| |<------------->| |<--            
            ---               ---               ---               
             |                 |                 |                
   --...    \|/               \|/               \|/          ...--
   |  ---   ---   ---   ---   ---   ---   ---   ---   ---   ---  |
   -->|H|<->|5|<->|2|<->|9|<->|5|<->|1|<->|6|<->|0|<->|9|<->|2|<--
      ---   ---   ---   ---   ---   ---   ---   ---   ---   ---   
Index:       0     1     2     3     4     5     6     7     8

Accessing index 4, for example, will now only cost 4, compared to the previous cost of 5.

Why not extend this idea all the way? What if we added several pointers to each node in the recursive lists? What if we also recurred all the way until the highest level list contained only one node?

                                    ---                           
                   -----------------|R|-----------------          
                   |                ---                |          
                   |                 |                 |          
                  \|/               \|/               \|/         
                  ---               ---               ---         
             -----| |-----     -----| |-----     -----| |-----    
             |    ---    |     |    ---    |     |    ---    |    
             |     |     |     |     |     |     |     |     |    
   --...    \|/   \|/   \|/   \|/   \|/   \|/   \|/   \|/   \|/ ..
   |  ---   ---   ---   ---   ---   ---   ---   ---   ---   ---  |
   -->|H|<->|5|<->|2|<->|9|<->|5|<->|1|<->|6|<->|0|<->|9|<->|2|<--
      ---   ---   ---   ---   ---   ---   ---   ---   ---   ---   
Index:       0     1     2     3     4     5     6     7     8

As you can see, every access now costs exactly 3. The structure we have build is called a tree and the single node at the top is called the root. We will cover trees in much more detail later in the class.