Homework 6: Hash Tables
In this assignment, you'll be exploring hash tables with various collision resolution strategies. Your task is to implement these approaches and measure their performance using profiling techniques.
Objectives
- Implement hash tables using both open addressing and chaining collision resolution strategies.
- Design and implement efficient resizing and rehashing mechanisms to maintain O(1) average time complexity.
- Compare and analyze the performance tradeoffs between open addressing and chaining approaches.
- Optimize hash table performance through load factor tuning and prime number table sizing, and compare observed performance to theoretical performance.
Important Dates
| Event | Day | Date | Time |
|---|---|---|---|
| Release | Wed | Apr 01 | 09:00 AM ET |
| Due | Fri | Apr 10 | 05:00 PM ET |
Homework Description
In this assignment, you'll implement two key techniques for building hash maps: open addressing and chaining.
Note, you are not allowed to use (1) Java Collections (built-in data structures), (2) data structures that are not covered in the course yet. The exception is for the auxiliary data structure to implement the chaining strategy in
ChainingHashMap. It is okay to use Java's ArrayList or LinkedList (or their equivalents/other data structures we developed in the course or earlier homework).
Starter Code
Download the starter code from the following link: hw6.zip.
The starter code is a zip file named hw6-starter.zip. You must unzip this file. After unzipping, you will get a folder called hw6-starter. Open this folder as an IntelliJ project.
Tasks
This assignment consists of two main tasks:
- Implementation and Profiling: Implement and profile two hash map variants:
ChainingHashMapandOpenAddressingHashMap. - Discussion: Provide insights on profiling techniques and implementation details for both hash maps.
Implement Hash Tables
Your first task is to implement two hash table classes, ChainingHashMap and OpenAddressingHashMap, which both implement the Map interface.
You have flexibility in your implementation:
- For chaining, choose an auxiliary data structure (e.g., ArrayList or LinkedList) that best suits your needs.
- For open addressing, select a probing strategy that minimizes collisions. You should implement linear probing and at least one other strategy for comparison!
- Decide on a threshold for rehashing the table based on load factor to optimize performance.
Your task is to figure out how to implement and optimize these hash tables. Additionally, report which one performs better (ChainingHashMap or OpenAddressingHashMap).
Requirements:
- Use a plain (generic) array as the underlying data structure for both
ChainingHashMapandOpenAddressingHashMap. - Manually resize the table to maintain an acceptable load factor.
- When selecting a table size (capacity), begin by starting at 5. As you resize, choose a prime number that is larger than double the previous size.
- All critical map operations must run in average (or amortized) O(1) expected time.
You can use the provided unit tests to ensure your implementation is correct. Feel free to add more tests to MapTest but do not modify the provided test cases.
Hash Table Performance & Discussion
For your second task, you'll profile the performance of your hash tables. Since for this homework, you've built a couple more map implementations, a natural starting point would be with the benchmarking code you wrote for HW4. At a minimum, you should aim to answer at least two the following questions, plus a new question of your own:
-
Do your hash tables consistently outperform your AVL tree from HW4? Justify your answer with data drawn from running the same tests on both implementations, and if not, offer a hypothesis about why. If you answer this question, you must also include your AVL tree implementation that you benchmarked against.
-
Does the load factor of your hash table have the impact you'd expect in terms of performance? Do you see a meaningful difference between a table that is "somewhat full" (for example, a max load factor of .5) vs one that is "very full" (for example, a max load factor of .95)? Does a load factor of .5 vs .75 make a meaningful difference?
-
Which hash table implementation results in fewer collisions? Do you observe the relationship between load factor and number of collisions that was discussed in the course book? To answer this, you could augment your hash tables so they can track how many collisions occurred (similarly to what was done with the PerformanceMonitor from HW3).
-
Do you observe a relationship between the number of collisions that occur during your test, and the time it took to run? How strong is the relationship, if any? You can perform a linear regression with Excel or similar tools to identify the strength of the relationship.
Note that while we encourage you to use your experiment for HW4 as a starting point, it is not sufficient on its own. We would not expect that building the map out of elements that are in sorted vs random order to have any meaningful impact on the performance of a hashtable, whereas the impact on an (unbalanced) BST is quite severe. At the same time, there are more interesting conclusions to be drawn from observing the number of collisions and the impact of load factor.
Use your README (src/main/java/README.md) file to explain why one of your implementations (OpenAddressingHashMap or ChainingHashMap) is the better performer, or why there is no noticeable difference. Describe the approaches you tried, such as probing strategies and rehashing strategies, and why you chose them. Note any specific tweaks that improved or worsened performance. Record failed or abandoned attempts, if any, and why you think they failed and what you learned from their failure.
You must turn in the code that you wrote for your experiment.
Submission
In preparation, you must (1) run checkstyle to ensure your code is checkstyle compliant (if not, fix the styling issues). (2) Create a zip file containing the src folder (and all therein). (3) Submit the zip file to Gradescope.
You can submit as many times as you like until deadline. We will only grade the last submission.
Rubric
Teaching assistants will evaluate your submission using the following criteria:
Please note, we reserve the right to change this rubric without advance notice, even after the submission deadline. However, the chances of doing so are slim, and at any rate, should not be a significant change.
-
Spec 1: The
OpenAddressingHashMapconstructor is correctly and efficiently implemented. [1 point] -
Spec 2: The implementation of
OpenAddressingHashMap.insertis correct and efficient, demonstrating a thorough comprehension of the open addressing technique and its operational intricacies. [3 points] -
Spec 3: The implementation of
OpenAddressingHashMap.removeis correct and efficient, demonstrating a thorough comprehension of the open addressing technique and its operational intricacies. [2 points] -
Spec 4: The implementation of
OpenAddressingHashMap.putis correct and efficient, allowing to update the value associated with a key. [1 point] -
Spec 5: The implementation of
getandhasoperations inOpenAddressingHashMapare correct and efficient, ensuring fast lookups and membership tests. [1 point] -
Spec 6: The implementation of
OpenAddressingHashMap.sizeis correct and efficient. [0.5 points] -
Spec 7: The iterator of
OpenAddressingHashMapis implemented correctly and efficiently. [2 points] -
Spec 8: The
ChainingHashMapconstructor is correctly and efficiently implemented. [1 point] -
Spec 9: The implementation of
ChainingHashMap.insertis correct and efficient, demonstrating a thorough comprehension of the chaining technique and its operational intricacies. [2 points] -
Spec 10: The implementation of
ChainingHashMap.removeis correct and efficient, demonstrating a thorough comprehension of the chaining technique and its operational intricacies. [2 points] -
Spec 11: The implementation of
ChainingHashMap.putis correct and efficient, allowing to update the value associated with a key. [1 point] -
Spec 12: The implementation of
getandhasoperations inChainingHashMapare correct and efficient, ensuring fast lookups and membership tests. [1 point] -
Spec 13: The implementation of
ChainingHashMap.sizeis correct and efficient. [0.5 points] -
Spec 14: The iterator of
ChainingHashMapis implemented correctly and efficiently. [2 points] -
Spec 15: The submission contains (in the
README.mdfile) answers to the discussion question about benchmarking. The points made in the answer are adequate and backed by experimental data. [4 points] -
Spec 16: The code is well-organized, readable, and follows coding standards. [1 point]
The total score for this homework is 25 points.