|
| Csc 125 online - Computer Science II, Programming in C++ |
| Project 2 |
| Due: Sunday, October 1, 2006 (Monday at 3:00 am) |
A Data Structure is a method for organizing data in the main memory of the computer so that it is easy to access and process. Usually data is inputted from some source, such as a file, and stored in a data structure. It is then processed in main memory, because that is the fastest way to process a large amount of data. Finally, it is outputted to some destination, often also a file, or maybe the screen.
You already know one of the most common data structures, the array. If you think about the above definition, the array fits it perfectly. However, there are a number of other data structures that are used in a wide variety of computer programs. The two most common, other than the array, are the linked-list and the binary search tree. This project will give you practice with both. You will write two programs that do identical tasks. One will store the data in a linked-list and the other will store the data in a binary tree. You will compare how long it takes to load the data into each data structure.
This project will also give you some more practice creating Abstract Data Types.
1. Get a file name from the user, assumed to be an ordinary
text file.
2. Start the Timer (I'll give you the code).
3. Read the file and store each word in main memory, using a linked list for the
first program and a binary search tree for the second program. Do not have
any duplicate nodes for the same word. Instead, keep an integer count for
each word in the node for that word.
4. When your program gets to the end of the file, stop the Timer and report how
long it took.
5. Go into a loop that gets a word from the user, and searches the data
structure to find out how many times that word appears in the file. Stop
the loop when the user types in "q!".
6. For the Linked List version:
A linked list will store the words in alphabetical order only if you program it
to maintain alphabetical order as you enter each node. You must program
your linked list so that it maintains alphabetical order as you insert the
words. You may not sort the link list after the file has been read, since
a sort at the end would be very inefficient. At the end, do a simple walk
of the linked list and print to the screen each word in and how many times that
word appeared.
For the Binary Search Tree version: Since a
correct binary search tree will always store its words in alphabetical order, do
an in-order traversal of the binary search tree, printing to the screen the word
in each node and how many times that word appeared in the text file.
You can see a working executable of the finished project using a binary search tree at ~sbadman/csc125traditional/project2solution/
Code on the Internet will almost always implement a binary tree using recursion, and often will use recursion to implement a linked list. We will study recursion later, and will study the difference between an iterative and a recursive solution for binary search trees. If you don't know what I am talking about, that is fine. Implement your linked list and tree functions using while loops. You may not use recursion in this project, even if you know how. (For those of you who know how, it will be good practice to implement a non-recursive solution).
You are welcome to modify or completely re-write the supplied code, but that is not necessary for this project. I suggest you concentrate on creating the Linked List and Binary Search Tree Abstract Data Types (i.e. classes).
I also suggest that you complete the Linked List Abstract Data Type first, and get the program working completely correctly using the linked list. Once you understand the linked list code, you will find that the binary search tree is very similar. Linked lists have a pointer to the next node in the list. Binary search trees have two pointers, one to the left child node and one to the right child node. Because of the two choices, binary search tree code is a little more complicated than the linked list code, but there are no significant conceptual differences.
It is standard style to create a separate class for the nodes of the list or tree then a different class that represents the entire list or tree. A common mistake that students make is to try to create a single class that does everything. A list or tree has to consist of nodes, and it is much easier to have each node to be an object of a Node class, and then have a single list or tree object that gives access to the data in the nodes, but keeps direct access to the node objects within itself.
It is also standard style to make the
list or tree classes the friend of the Node classes. The code looks like
this:
class ListNode
{
friend class LinkedList;
public:
// other code
private:
Node nextNode;
};
The friend
statement in the code above lets any function in the
LinkedList class access to the private data in any Node object. Thus the
LinkedList object can manipulate the nextNode
pointers directly. This is a violation of good object oriented design, but
is normally accepted in data structures because efficient manipulation of
pointers is critical to the performance of the program. Good object
oriented design would require a getter and setter function in class Node.
That would add two function calls to each pointer assignment, however, and would
slow down the execution of the program significantly. You are
welcome to use friend
statements in your code if you wish.
There are three keys to understanding the code for both linked lists and binary search trees:
1. Use diagrams! Diagrams similar to the green boxes in Deitel's section 21.4 are absolutely necessary for understanding data structures. Before you even start coding, be sure you understand the algorithm using diagrams. Then if you are stuck as you are coding, draw a diagram, and try to change the pointers to follow what your code says.
2. A node class (or
struct in C) can contain a pointer to a another node of the same class.
Although this would never work in C or C++:
class Node
{
// other code
private:
Node nextNode;
};
because an object of class Node would contain a
complete object of class Node, which would contain a complete object of class
Node, which would contain a complete object of class Node, which would ....
However, this does work:
class Node
{
// other code
private:
Node* nextNode;
};
because a pointer to an object is completely different
from an object. A pointer is just 4 bytes in size and cannot "contain"
anything but the pointer itself. There is no object inside another object.
Note that in Java, you can have
the following definition:
public class Node
{
// other code
private:
Node nextNode;
}
because all object names in Java are
references, so that it is impossible for an object to be actually contained
inside another object.
3. Pointers are variables, and can be assigned a value in a regular assignment statement. Just don't dereference the pointer and the pointer (i.e. the address) will be copied instead of the value it points at. It usually looks something like this:
currentListNode =
currentListNode->next;
or
currentTreeNode =
currentTreeNode->left;
currentTreeNode = currentTreeNode->right;
In the above assignment
statements the currentListNode
pointer variable now points to the next Node in the list.
The currentTreeNode
pointer now points at the left or right child Node.
This is the algorithm for the non-recursive walk of a tree:
It requires the use of a stack of nodes. The common operations on a stack are to "push" on top of the stack, and to "pop" off the top of the stack. Stacks are very simple to implement. Here is an implementation for this project:
#define EMPTY -1
#define SIZE 100
TreeNode* stack[SIZE];
int top = EMPTY
To push the current node onto the stack, you can use this one line:
stack[++top] = current;
To pop the top node off the stack into current, you can use this
line:
current = stack[top--];
The non recursive print
algorithm looks like this in pseudocode:
IF root NOT EQUAL NULL
current = root
WHILE current NOT EQUAL NULL
PUSH current
current = current->left
END WHILE
WHILE stack NOT EMPTY
POP INTO current
PRINT current's data
IF (current->right)
current =
current->right
WHILE current
NOT EQUAL NULL
PUSH current
current = current->left
END WHILE
END IF
END WHILE
END IF
ELSE
PRINT "Empty Tree"
END ELSE
Your programs must be identical except for using a Linked-List Abstract Data Type and a Binary Search Tree Abstract Data Type.
The Linked-List and Binary Search Tree Abstract Data Types must have identical public interfaces. The only difference between the two data types can be the internal implementation of the public interface, one using a linked list and the other using a binary search tree.
Put the classes for the Linked-List in their own .cpp and .h file, and the classes for the Binary Search Tree in their own .cpp and .h file.
You can put each of your two programs in
a separate subdirectory of the normal directory you submit, or you can include
everything in a single program if you include a simple method to comment and
uncomment which data structure is used. For instance, if the only code
difference to use the two data types is changing which of the following lines is
commented, you can use a single program for both data types:
LinkedList data;
//BinaryTree data;
Your project must implement the
concepts of Data Encapsulation inside a well designed
Abstract Data Type for both data structures.
All data stored inside your class must be
private. Any functions
that are not part of the public interface of your classes must be private.
Your program cannot use any global variables.
You are encouraged to use the supplied
code at ~sbadman/csc125traditional/project2start/ as is, but you are
welcome to change it if you want.
Your int
main() must be a short simple function definition
that merely creates a object
of your class, and then tests it by calling all of its public methods.
Each of the two data structures must be in their own .h and .cpp files.
Your program
must be written for a Linux system, using ANSI C++ and good Object Oriented C++ programming
style. Before grading, you must check it on Parkland's Linux computers
for proper operation. You may not use any non-ANSI C or C++ extensions.
Grading
Project 2 is worth 20 points toward the final grade. It will be graded according to the criteria on the Project 2 Grading Criteria.
Your project must be properly submitted into the ~sbadman/csc125/students/project2/ directory by 3:00 am Monday, October 2nd. You may have your project graded before October 2nd, if you wish, by making arrangements with the instructor. If you have your project graded early, you will still have until 3:00 am Monday to re-submit your program with corrections.
| Back to Csc 125 - Computer Science II, Programming in C++ |
| Scott Badman Office: B132 Phone: 353-2250 sbadman@parkland.edu |
Parkland College, 2400 W. Bradley Avenue, Champaign, IL 61821 |