Homework 6: Hash Tables

In this assignment, you'll be exploring hash tables with various collision resolution strategies. Your task is to implement these approaches and measure their performance using profiling techniques.

Objectives

  1. Implement hash tables using both open addressing and chaining collision resolution strategies.
  2. Design and implement efficient resizing and rehashing mechanisms to maintain O(1) average time complexity.
  3. Compare and analyze the performance tradeoffs between open addressing and chaining approaches.
  4. Optimize hash table performance through load factor tuning and prime number table sizing, and compare observed performance to theoretical performance.

Important Dates

EventDayDateTime
ReleaseWedApr 0109:00 AM ET
DueFriApr 1005:00 PM ET

Homework Description

In this assignment, you'll implement two key techniques for building hash maps: open addressing and chaining.

Note, you are not allowed to use (1) Java Collections (built-in data structures), (2) data structures that are not covered in the course yet. The exception is for the auxiliary data structure to implement the chaining strategy in ChainingHashMap. It is okay to use Java's ArrayList or LinkedList (or their equivalents/other data structures we developed in the course or earlier homework).

Starter Code

Download the starter code from the following link: hw6.zip.

The starter code is a zip file named hw6-starter.zip. You must unzip this file. After unzipping, you will get a folder called hw6-starter. Open this folder as an IntelliJ project.

Tasks

This assignment consists of two main tasks:

  1. Implementation and Profiling: Implement and profile two hash map variants: ChainingHashMap and OpenAddressingHashMap.
  2. Discussion: Provide insights on profiling techniques and implementation details for both hash maps.

Implement Hash Tables

Your first task is to implement two hash table classes, ChainingHashMap and OpenAddressingHashMap, which both implement the Map interface.

You have flexibility in your implementation:

  • For chaining, choose an auxiliary data structure (e.g., ArrayList or LinkedList) that best suits your needs.
  • For open addressing, select a probing strategy that minimizes collisions. You should implement linear probing and at least one other strategy for comparison!
  • Decide on a threshold for rehashing the table based on load factor to optimize performance.

Your task is to figure out how to implement and optimize these hash tables. Additionally, report which one performs better (ChainingHashMap or OpenAddressingHashMap).

Requirements:

  • Use a plain (generic) array as the underlying data structure for both ChainingHashMap and OpenAddressingHashMap.
  • Manually resize the table to maintain an acceptable load factor.
  • When selecting a table size (capacity), begin by starting at 5. As you resize, choose a prime number that is larger than double the previous size.
  • All critical map operations must run in average (or amortized) O(1) expected time.

You can use the provided unit tests to ensure your implementation is correct. Feel free to add more tests to MapTest but do not modify the provided test cases.

Hash Table Performance & Discussion

For your second task, you'll profile the performance of your hash tables. Since for this homework, you've built a couple more map implementations, a natural starting point would be with the benchmarking code you wrote for HW4. At a minimum, you should aim to answer at least two the following questions, plus a new question of your own:

  • Do your hash tables consistently outperform your AVL tree from HW4? Justify your answer with data drawn from running the same tests on both implementations, and if not, offer a hypothesis about why. If you answer this question, you must also include your AVL tree implementation that you benchmarked against.

  • Does the load factor of your hash table have the impact you'd expect in terms of performance? Do you see a meaningful difference between a table that is "somewhat full" (for example, a max load factor of .5) vs one that is "very full" (for example, a max load factor of .95)? Does a load factor of .5 vs .75 make a meaningful difference?

  • Which hash table implementation results in fewer collisions? Do you observe the relationship between load factor and number of collisions that was discussed in the course book? To answer this, you could augment your hash tables so they can track how many collisions occurred (similarly to what was done with the PerformanceMonitor from HW3).

  • Do you observe a relationship between the number of collisions that occur during your test, and the time it took to run? How strong is the relationship, if any? You can perform a linear regression with Excel or similar tools to identify the strength of the relationship.

Note that while we encourage you to use your experiment for HW4 as a starting point, it is not sufficient on its own. We would not expect that building the map out of elements that are in sorted vs random order to have any meaningful impact on the performance of a hashtable, whereas the impact on an (unbalanced) BST is quite severe. At the same time, there are more interesting conclusions to be drawn from observing the number of collisions and the impact of load factor.

Use your README (src/main/java/README.md) file to explain why one of your implementations (OpenAddressingHashMap or ChainingHashMap) is the better performer, or why there is no noticeable difference. Describe the approaches you tried, such as probing strategies and rehashing strategies, and why you chose them. Note any specific tweaks that improved or worsened performance. Record failed or abandoned attempts, if any, and why you think they failed and what you learned from their failure.

You must turn in the code that you wrote for your experiment.

Submission

In preparation, you must (1) run checkstyle to ensure your code is checkstyle compliant (if not, fix the styling issues). (2) Create a zip file containing the src folder (and all therein). (3) Submit the zip file to Gradescope.

You can submit as many times as you like until deadline. We will only grade the last submission.

Rubric

Teaching assistants will evaluate your submission using the following criteria:

Please note, we reserve the right to change this rubric without advance notice, even after the submission deadline. However, the chances of doing so are slim, and at any rate, should not be a significant change.

  • Spec 1: The OpenAddressingHashMap constructor is correctly and efficiently implemented. [1 point]

  • Spec 2: The implementation of OpenAddressingHashMap.insert is correct and efficient, demonstrating a thorough comprehension of the open addressing technique and its operational intricacies. [3 points]

  • Spec 3: The implementation of OpenAddressingHashMap.remove is correct and efficient, demonstrating a thorough comprehension of the open addressing technique and its operational intricacies. [2 points]

  • Spec 4: The implementation of OpenAddressingHashMap.put is correct and efficient, allowing to update the value associated with a key. [1 point]

  • Spec 5: The implementation of get and has operations in OpenAddressingHashMap are correct and efficient, ensuring fast lookups and membership tests. [1 point]

  • Spec 6: The implementation of OpenAddressingHashMap.size is correct and efficient. [0.5 points]

  • Spec 7: The iterator of OpenAddressingHashMap is implemented correctly and efficiently. [2 points]

  • Spec 8: The ChainingHashMap constructor is correctly and efficiently implemented. [1 point]

  • Spec 9: The implementation of ChainingHashMap.insert is correct and efficient, demonstrating a thorough comprehension of the chaining technique and its operational intricacies. [2 points]

  • Spec 10: The implementation of ChainingHashMap.remove is correct and efficient, demonstrating a thorough comprehension of the chaining technique and its operational intricacies. [2 points]

  • Spec 11: The implementation of ChainingHashMap.put is correct and efficient, allowing to update the value associated with a key. [1 point]

  • Spec 12: The implementation of get and has operations in ChainingHashMap are correct and efficient, ensuring fast lookups and membership tests. [1 point]

  • Spec 13: The implementation of ChainingHashMap.size is correct and efficient. [0.5 points]

  • Spec 14: The iterator of ChainingHashMap is implemented correctly and efficiently. [2 points]

  • Spec 15: The submission contains (in the README.md file) answers to the discussion question about benchmarking. The points made in the answer are adequate and backed by experimental data. [4 points]

  • Spec 16: The code is well-organized, readable, and follows coding standards. [1 point]

The total score for this homework is 25 points.