Csc 125 online - Computer Science II, Programming in C++
Project 2 
Due: Sunday, October 1, 2006 (Monday at 3:00 am)

Comparing a linked-list and binary search tree Word Counter.

 

Data Structures

A Data Structure is a method for organizing data in the main memory of the computer so that it is easy to access and process.  Usually data is inputted from some source, such as a file, and stored in a data structure.  It is then processed in main memory, because that is the fastest way to process a large amount of data.  Finally, it is outputted to some destination, often also a file, or maybe the screen.

You already know one of the most common data structures, the array.   If you think about the above definition, the array fits it perfectly.  However, there are a number of other data structures that are used in a wide variety of computer programs.  The two most common, other than the array, are the linked-list and the binary search tree.  This project will give you practice with both.  You will write two programs that do identical tasks.  One will store the data in a linked-list and the other will store the data in a binary tree.  You will compare how long it takes to load the data into each data structure.

This project will also give you some more practice creating Abstract Data Types.


Assignment

Write two complete C++ programs that perform the following steps:

1. Get a file name from the user, assumed to be an ordinary text file.
2. Start the Timer (I'll give you the code).
3. Read the file and store each word in main memory, using a linked list for the first program and a binary search tree for the second program.  Do not have any duplicate nodes for the same word.  Instead, keep an integer count for each word in the node for that word.
4. When your program gets to the end of the file, stop the Timer and report how long it took.
5. Go into a loop that gets a word from the user, and searches the data structure to find out how many times that word appears in the file.  Stop the loop when the user types in "q!".
6. For the Linked List version:  A linked list will store the words in alphabetical order only if you program it to maintain alphabetical order as you enter each node.  You must program your linked list so that it maintains alphabetical order as you insert the words.  You may not sort the link list after the file has been read, since a sort at the end would be very inefficient.  At the end, do a simple walk of the linked list and print to the screen each word in and how many times that word appeared.  
    For the Binary Search Tree version:  Since a correct binary search tree will always store its words in alphabetical order, do an in-order traversal of the binary search tree, printing to the screen the word in each node and how many times that word appeared in the text file. 
 

Hints and supplied code

You can find code that you are free to include and use in your program at  ~sbadman/csc125traditional/project2start/    It contains a running program that does all of the above, except implementing the linked list or binary search tree Abstract Data Types.  The suggested locations for calls to those Abstract Data Types are included as comments in the code.

You can see a working executable of the finished project using a binary search tree at ~sbadman/csc125traditional/project2solution/

Code on the Internet will almost always implement a binary tree using recursion, and often will use recursion to implement a linked list.  We will study recursion later, and will study the difference between an iterative and a recursive solution for binary search trees.  If you don't know what I am talking about, that is fine.  Implement your linked list and tree functions using while loops.  You may not use recursion in this project, even if you know how.  (For those of you who know how, it will be good practice to implement a non-recursive solution).

You are welcome to modify or completely re-write the supplied code, but that is not necessary for this project.  I suggest you concentrate on creating the Linked List and Binary Search Tree Abstract Data Types (i.e. classes).

I also suggest that you complete the Linked List Abstract Data Type first, and get the program working completely correctly using the linked list.  Once you understand the linked list code, you will find that the binary search tree is very similar.  Linked lists have a pointer to the next node in the list.  Binary search trees have two pointers, one to the left child node and one to the right child node.  Because of the two choices, binary search tree code is a little more complicated than the linked list code, but there are no significant conceptual differences.

It is standard style to create a separate class for the nodes of the list or tree then a different class that represents the entire list or tree.  A common mistake that students make is to try to create a single class that does everything.  A list or tree has to consist of nodes, and it is much easier to have each node to be an object of a Node class, and then have a single list or tree object that gives access to the data in the nodes, but keeps direct access to the node objects within itself.

It is also standard style to make the list or tree classes the friend of the Node classes.  The code looks like this:

class ListNode
{
friend class LinkedList;
public:
  // other code
private:
  Node nextNode; 
};

The friend statement in the code above lets any function in the LinkedList class access to the private data in any Node object.  Thus the LinkedList object can manipulate the nextNode  pointers directly.  This is a violation of good object oriented design, but is normally accepted in data structures because efficient manipulation of pointers is critical to the performance of the program.   Good object oriented design would require a getter and setter function in class Node.  That would add two function calls to each pointer assignment, however, and would slow down the execution of the program significantly.   You are welcome to use  friend statements in your code if you wish.  

There are three keys to understanding the code for both linked lists and binary search trees:

1.  Use diagrams!  Diagrams similar to the green boxes in Deitel's section 21.4 are absolutely necessary for understanding data structures.  Before you even start coding, be sure you understand the algorithm using diagrams.  Then if you are stuck as you are coding, draw a diagram, and try to change the pointers to follow what your code says. 

2.  A node class (or struct in C) can contain a pointer to a another node of the same class.  Although this would never work in C or C++:

class Node
{
  // other code
private:
  Node nextNode; 
};

because an object of class Node would contain a complete object of class Node, which would contain a complete object of class Node, which would contain a complete object of class Node, which would ....

However, this does work:

class Node
{
  // other code
private:
  Node* nextNode; 
};

because a pointer to an object is completely different from an object.  A pointer is just 4 bytes in size and cannot "contain" anything but the pointer itself.  There is no object inside another object.

Note that in Java, you can have the following definition:

public class Node
{
  // other code
private:
  Node nextNode; 
}

because all object names in Java are references, so that it is impossible for an object to be actually contained inside another object.

3.  Pointers are variables, and can be assigned a value in a regular assignment statement.  Just don't dereference the pointer and the pointer (i.e. the address) will be copied instead of the value it points at.  It usually looks something like this:

 currentListNode = currentListNode->next;

or

 currentTreeNode = currentTreeNode->left;
 currentTreeNode = currentTreeNode->right;

In the above assignment statements the  currentListNode pointer variable now points to the next Node in the list.  The currentTreeNode pointer now points at the left or right child Node.

 

 

This is the algorithm for the non-recursive walk of a tree:

It requires the use of a stack of nodes.  The common operations on a stack are to "push" on top of the stack, and to "pop" off the top of the stack.  Stacks are very simple to implement.  Here is an implementation for this project:

#define EMPTY -1
#define SIZE 100
TreeNode* stack[SIZE];
int top = EMPTY

To push the current node onto the stack, you can use this one line:
stack[++top] = current;

To pop the top node off the stack into current, you can use this line:
current = stack[top--];

The non recursive print algorithm looks like this in pseudocode:

IF root NOT EQUAL NULL
    current = root
    WHILE current NOT EQUAL NULL
        PUSH current
        current = current->left
    END WHILE
    WHILE stack NOT EMPTY
        POP INTO current
        PRINT current's data
        IF (current->right)
            current = current->right
            WHILE current NOT EQUAL NULL
                PUSH current
                current = current->left
            END WHILE
        END IF
    END WHILE
END IF
ELSE
    PRINT "Empty Tree"
END ELSE

 

Requirements

Your programs must be identical except for using a Linked-List Abstract Data Type and a Binary Search Tree Abstract Data Type.

The Linked-List and Binary Search Tree Abstract Data Types must have identical public interfaces.  The only difference between the two data types can be the internal implementation of the public interface, one using a linked list and the other using a binary search tree.

Put the classes for the Linked-List in their own .cpp and .h file, and the classes for the Binary Search Tree in their own .cpp and .h file.

You can put each of your two programs in a separate subdirectory of the normal directory you submit, or you can include everything in a single program if you include a simple method to comment and uncomment which data structure is used.  For instance, if the only code difference to use the two data types is changing which of the following lines is commented, you can use a single program for both data types:

 LinkedList data;
 //BinaryTree data;


Your project must implement the
concepts of Data Encapsulation inside a well designed Abstract Data Type for both data structures.

All data stored inside your class must be private.  Any functions that are not part of the public interface of your classes must be private.

Your program cannot use any global variables.

You are encouraged to use the supplied code at  ~sbadman/csc125traditional/project2start/  as is, but you are welcome to change it if you want.  

Your
 
int main() must be a short simple function definition that merely creates a object of your class, and then tests it by calling all of its public methods.  Each of the two data structures must be in their own .h and .cpp files.

Yo
ur program must be written for a Linux system, using ANSI C++ and good Object Oriented C++ programming style.  Before grading, you must  check it on Parkland's Linux computers for proper operation.   You may not use any non-ANSI C or C++ extensions.


Grading

Project 2 is worth 20 points toward the final grade.  It will be graded according to the criteria on the Project 2 Grading Criteria. 


Date

Your project must be properly submitted into the ~sbadman/csc125/students/project2/  directory by 3:00 am Monday, October 2nd.   You may have your project graded before October 2nd, if you wish, by making arrangements with the instructor.  If you have your project graded early, you will still have until 3:00 am Monday to re-submit your program with corrections. 

  
Back to Csc 125 - Computer Science II, Programming in C++
  Scott Badman   Office: B132   Phone: 353-2250   sbadman@parkland.edu  

Parkland College, 2400 W. Bradley Avenue, Champaign, IL 61821