Parkland College
2400 West Bradley Avenue, Champaign, Illinois 61821
Csc 220 Data Structures
Take Home Final
Project 4
 
Tuesday, December 11, 2006

Creating a Multi-table Data Base

Data bases and data base design are an extremely important sector of computing.  The Computer Science theory underlying data base design is extensive.  Most modern data bases have been created with the relational data base structure.  Although beyond the scope of this course, relational data bases consist of many tables of data with links and connections between them.  The advantage is that the data is stored as few times as possible, usually only once.  If a item of data is the same in multiple records, the records will be constructed with links of some type to the single instance of the stored data.  Usually, each table has a unique number assigned to each row in the table.  This number is called the key, and the copies of the keys are stored where needed, instead of copies of the data itself.   Another advantage of relational data bases is the ease with which new associations of the data can be created.  Usually, the data itself is never copied or moved, just the various links to the data are configured in a new way. 

 

Assignment

Write a USGeographicNames class that stores the data in us_concise_modified.txt.  The design of class USGeorgraphicNames is up to you, as long as you meet the requirements below.  You may use any code that you have previously developed for this course, or use the Standard Template Library, or any other code that you modify and incorporate into your program, as long as that code implements basic data structures, and not data base concepts directly.

main.cpp is supplied for you.  It reads the file completely, divides the data into individual variables, and suggests function calls to make to your new class.  main.cpp also interacts with the user, makes suggested calls to your class, and prints the results to the screen.

You may create an object oriented design, with additional classes, if you wish, or put all the required functionality into a single class.  Your object oriented design will not be considered in the graded, only the data base design you implement.

You also must create a diagram that shows your data base structure.

 

Requirements

All the data passed to your class may be stored only once in your code, except for seconds and hemispheres in the latitude and longitude.  All other access or relationships to the data must be through links of some kind.

All data must be in structures that are efficient for searches, O(lg n).  Assembly of the data into strings for the result vector must be O(n), where n is the number of strings returned in the vector<string>.  The storage of the data into your data base may be less efficient, if needed.
 
The links can be integer key values, index numbers, or pointers, or any other data type that is no more than 4 bytes in length.  You may also encode data in some way, as long as it does not create additional storage.  For instance, you can create negative latitude and longitude values as Southern Hemisphere and Eastern Hemisphere values respectively.

You may create as many links as you want, pointing in either direction or both directions between two sets of data.  You may create whatever key values you need.  You may also create entire tables of only links, if you need them.

You may create additional classes to hold multiple pieces of data, or multiple links, as long as the data is not stored repetitively.

Your class must pass back to main.cpp printable strings in a vector<string> object passed to your class by reference.  The strings in the vector must be the complete and correct data, rearranged, but identical to the data items in the file. 

You may store multiple copies of the latitude seconds and longitude seconds for each line of data and the hemisphere characters, even though they are duplicates, since they are a smaller number of bytes than a normal link.  You may also combine the degrees and minutes of the latitude and longitude into a single piece of data, if you wish, or you may keep them as separate data.

The data that needs to be displayed for each menu choice in the result strings is as follows.
 

" A: Search for a specific place name and list all data."
" B: List all place names. "
" C: Search a specific county and list all states. "
" D: List all county names. "
" E: List all data in a specific state. "
" F: List all states. "
" G: List all data for a specific classification. "
" H: List all classifications. "
" I: List all data in a latitude and longitude box. "
" J: List all data in a range of elevations. "

A: Full data record for each line in the file that matches the name with the user's input exactly.
B: A simple list of all the place names, without duplicates, without other data.
C: A simple list of states that contain a county that matches the user's input exactly.
D: A simple list of all counties, without duplicates, without other data.
E: Full data record for each line in the file that is within the state that matches the user's input exactly.
F: A simple list of all states, without duplicates, without other data.
G. Full data record for each line in the file that is classified by the classification that matches the user's input exactly.
H. A simple list of all classifications, without duplicates, without other data.
I.   Full data record for each line in the file within the box created by the latitude's degrees and minutes + the boxsideminutes, and the longitude's degrees and minutes + the boxsideminutes.
J.  Full data record for each line in the file that is between the given elevations, inclusive, by the user.


  

Important grading note

You will present your data base structure in an oral presentation during the Final period, Tuesday, December 11th from 6:00 to 8:00 pm.  You are welcome to have Scott assess and pre-grade your project before the presentation.
 

Grading

The Final (Project 4) is worth 20  points toward the final grade.  It will be graded according to the criteria on the Project 4 Grade Report.
 

Date

Your Final will be graded on Tuesday, December 11th, from 6:00 to 8:00, the scheduled Final Period for this courseYou are encouraged to have your work, except the presentation, before the Final Period.  The Final Period is the absolutely last time that you can get a grade for any work in the course.
 

Back to Csc 220 Data Structures
  Scott Badman   Office: B132   Phone: 353-2250   sbadman@parkland.edu  

Parkland College, 2400 W. Bradley Avenue, Champaign, IL 61821