Homework 4: Self-balancing BST & Priority Queue

In this homework, you will implement and analyze self-balancing binary search trees (AVL Tree and Treap) and a binary heap-based priority queue. You'll also explore the performance of these data structures through benchmarking and tackle a selection problem.

Objectives

  • Implement and thoroughly test an AVL tree, demonstrating understanding of tree balancing techniques and their impact on performance.
  • Design and implement a Treap, showcasing comprehension of probabilistic data structures and their benefits.
  • Develop a binary heap-based priority queue, reinforcing understanding of heap properties and their applications.
  • Apply benchmarking techniques to compare the performance of different map implementations.
  • Analyze and evaluate different strategies for solving the selection problem, considering time and space complexity.
  • Strengthen skills in writing effective unit tests for complex data structures.
  • Demonstrate proficiency in recursive implementations of tree operations.
  • Gain practical experience in using Java Microbenchmark Harness (JMH) for performance analysis.

Important Dates

EventDayDateTime
ReleaseFriOct 1109:00 AM ET
DueWedOct 2305:00 PM ET

Homework Description

In this assignment, you will delve into the implementation and analysis of self-balancing binary search trees and a priority queue. You'll start by implementing an AVL tree, a height-balancing binary search tree, followed by a Treap, which uses randomization for balance. You'll also implement a binary heap-based priority queue.

You'll write comprehensive unit tests for each implementation, reinforcing the importance of thorough testing in software development. You'll also use the Java Microbenchmark Harness (JMH) to compare the performance of your implementations against other map implementations, gaining practical experience in benchmarking.

Finally, you'll analyze different strategies for solving the selection problem, applying your knowledge of data structures and algorithms to evaluate their correctness and efficiency.

This assignment will challenge you to apply theoretical concepts to practical implementations, hone your problem-solving skills, and deepen your understanding of advanced data structures.

Note, you are not allowed to use (1) Java Collections (built-in data structures), (2) data structures that are not covered in the course yet. (It is okay to use Java's Stack for implementing the iterator of AVL/Treap as done in the starter code for the [regular] BST.)

Starter Code

Download the starter code from the following link: hw4.zip

The starter code is a zip file named hw4-starter.zip. You must unzip this file. After unzipping, you will get a folder called hw4-starter. Open this folder as an IntelliJ project.

Tasks

This assignment consists of the following tasks:

  • Implement and test AVL tree, Treap, and Binary Heap
  • Discussion on benchmarking and selection problem

AVL Tree

Your first task is to write unit tests and complete the implementation of AvlTreeMap, which provides a balanced binary search tree by maintaining the AVL properties discussed in lecture.

Testing AvlTreeMap

We recommend following a test-first development approach: start by adding unit tests in AvlTreeMapTest before diving into the implementation. AvlTreeMapTest extends BinarySearchTreeMapTest, which in turn extends OrderedMapTest, and so on, up to MapTest. Each of these classes introduces more specific test cases that are already provided for you. Take time to study these existing tests to understand how to design effective unit tests for AvlTreeMap.

While writing tests, it's important to note that unit testing generally follows a black-box testing strategy. This means that your tests should be based on the expected behavior of the interface, without relying on the specifics of how that behavior is achieved internally. For example, the internal structure of the binary search tree should not directly influence your tests.

However, for this homework, we've provided a tool to help you better understand how your AvlTreeMap works behind the scenes: the toString() method, which uses BinaryTreePrinter.printBinaryTree() to visualize the tree's structure. This visualization acts like a "window" into the underlying structure of your AVL tree, helping you to debug and verify that rotations and balances are occurring as expected. Remember, this is just a visual aid and not something you would typically use in real-world testing.

Please do not modify the toString() method or the BinaryTreePrinter class, as they are intended to be used in their current form.

Implementing AvlTreeMap

Once you have a good set of tests, you can move on to implementing AvlTreeMap. To get started, use the provided BinarySearchTreeMap as a base. Copy its core operation implementations and adjust them to enforce AVL properties. Make sure to retain the recursive approach for these core operations—iterative methods are not allowed. Also, note that your internal Node class must not use a parent pointer.

Important Considerations for Implementation:

  • When removing a node with two non-null children, replace it with the maximum value found in its left subtree.

  • Aim for efficiency: all core operations should run in $\Omicron(\lg n)$ time. Efficiently track the height of each node without introducing operations that result in $\Omicron(n)$ complexity.

  • You can use the iterator of BinarySearchTreeMap as it is for AvlTreeMap. Feel free to copy the implementation of the iterator from BinarySearchTreeMap to AvlTreeMap.

Treap

Your second task is to complete the implementation of TreapMap, which provides a probabilistically balanced binary search tree by maintaining the treap properties discussed in lecture.

Testing TreapMap

As with the previous task, we recommend adopting a test-first development approach: start by adding unit tests in TreapMapTest before writing your implementation. The TreapMapTest class extends BinarySearchTreeMapTest, which extends OrderedMapTest, and so forth, up to MapTest. Each of these classes provides additional test cases. Reviewing these cases can help you understand what to test and how to write comprehensive unit tests for TreapMap.

Testing a data structure like a Treap presents unique challenges due to its use of randomness in determining node priorities. This randomness affects the sequence of rotations, making it harder to predict outcomes. To address this, you can make use of a seeded random number generator. In Java, random number generators produce sequences based on an initial "seed" value, which means that using the same seed will produce the same sequence of random numbers each time. This approach allows you to write tests that are both deterministic and reproducible.

We have included some fields in TreapMap (do not modify these) to help you control and verify your implementation. You might also consider adding a constructor to TreapMap that accepts a seed value for instantiating the Random class. This will allow you to control the priority values generated during testing, making it possible to anticipate specific rotations and tree structures. Make sure that the default constructor does not perform any special seeding of the rand field, as this is necessary for our AutoTest cases.

For more information on using seeds with Java's Random class, consult these resources:

Implementing TreapMap

After designing a solid set of tests, move on to implementing TreapMap. Use the provided BinarySearchTreeMap as a starting point, adapting its core operations to preserve the additional treap properties. Ensure that your implementation remains efficient, aiming for $\Omicron(\log n)$ time complexity for core operations. Similar to the AVL tree, you should retain the recursive approach for these operations and avoid using a parent pointer in the internal Node class. Moreover, when it comes to the iterator, you can use the iterator of BinarySearchTreeMap as it is for TreapMap.

Note: The Treap should be a min-heap over priorities.

Important Reminder: Your implementation must retain the default constructor and not modify its behavior. Any changes to this constructor could cause your solution to fail the AutoTest cases.

Binary Heap

Your next task is to implement the BinaryHeapPriorityQueue class, which manages a priority queue using a binary heap data structure with an array-based representation. This class should implement the PriorityQueue interface provided in the starter code.

Testing BinaryHeapPriorityQueue

As with the previous tasks, we strongly encourage a test-first development approach. Before implementing the heap operations, start by adding unit tests in the PriorityQueueTest class.

Writing unit tests for a priority queue should focus on verifying the behavior defined by the interface, rather than the internal implementation details. In other words, base your tests on what the PriorityQueue is expected to do—such as adding elements, removing the element with the highest priority, and returning the best—without making assumptions about how these actions are performed internally.

Implementing BinaryHeapPriorityQueue

After writing your tests, proceed with implementing the BinaryHeapPriorityQueue class. Ensure that your implementation maintains the heap property throughout all operations, aiming for efficient runtimes: specifically, $\Omicron(\log n)$ time complexity for insertion and removal, and $\Omicron(1)$ for best.

Important Considerations:

  • Your implementation must use the ranked array representation of a binary heap. Use Java's ArrayList to handle dynamic resizing of the ranked array when the heap reaches capacity.

  • Recall that any implementation of PriorityQueue must provide two constructors:

    • A default constructor (no arguments) that uses the natural ordering of the elements.
    • A non-default constructor that allows a Comparator to be provided, which overrides the natural ordering of the element types.
  • For the iterator of BinaryHeapPriorityQueue, implement a level-order traversal of the binary heap. You do not need any auxiliary data structures to implement the iterator!

Discussion

Part 1: Benchmarking

Once you're confident that your AvlTreeMap and TreapMap implementations are functioning correctly, it's time for some benchmarking!

We have provided a JmhRuntimeTest.java file, which benchmarks various Map implementations using an experiment defined in the Experiment.java file. The experiment builds a map containing a frequency count of all "words" that occur in an input (text) file.

You should not modify the JmhRuntimeTest or the Experiment.run method. However, feel free to change the Experiment.DATA_FILE to run the experiment on a variety of datasets. Several datasets are provided (see the src/test/resources folder), but you are welcome to create your own. Project Gutenberg is a great source for "natural language" text data if you wish to try additional datasets. (Creating new datasets is optional.)

Note: If you are using a Windows machine, follow the instructions in the Experiment.getPath method to make the necessary adjustments.

Please include the results from running JMH for different datasets in your README file (including the raw data). Afterward, describe your observations and try to explain them using your understanding of the data structures being benchmarked. In other words, why do you think the results are as they are?

Part 2: Selection Problem

Consider the following programming challenge:

You are given an array of $n$ integers in no particular order. Write a program that prints out the $k^{\text{th}}$ best (max or min) value where $k \leq n$.

Students came up with the following strategies to solve this problem (assuming "best" means "max"):

Strategy 1: Insert all the elements into a binary (max) heap and then remove the best (max) element $k$ times. The $k^{\text{th}}$ best value is the $k^{\text{th}}$ element removed.

Strategy 2: Insert the first $k$ elements into a binary (min) heap. For the remaining $n - k$ elements, compare each to the best (min) in the heap. If the element is larger, remove the best (min) and insert the (larger) element instead. When done, the best element in the heap is the $k^{\text{th}}$ best value in the collection.

For each strategy, determine if it is correct or incorrect. If incorrect, provide a brief explanation of why it is incorrect. Additionally, state the time and space complexity of each strategy.

Submission

In preparation, you must (1) run checkstyle to ensure your code is checkstyle compliant (if not, fix the styling issues). (2) Create a zip file containing the src folder (and all therein). (3) Submit the zip file to Gradescope.

You can submit as many times as you like until deadline. We will only grade the last submission.

Rubric

Teaching assistants will evaluate your submission using the following criteria:

Please note, we reserve the right to change this rubric without advance notice, even after the submission deadline. However, the chances of doing so are slim, and at any rate, should not be a significant change.

  • Spec 1: The AvlTreeMap.insert operation is correctly and efficiently implemented, maintaining AVL properties. [4 points]

  • Spec 2: The AvlTreeMap.remove operation is correctly and efficiently implemented, maintaining AVL properties. [4 points]

  • Spec 3: The AvlTreeMap.put operation is correctly and efficiently implemented. [1 point]

  • Spec 4: The AvlTreeMap.get operation is correctly and efficiently implemented. [1 point]

  • Spec 5: The AvlTreeMap iterator is correctly and efficiently implemented. [1 point]

  • Spec 6: The TreapMap.insert operation is correctly and efficiently implemented, maintaining Treap properties. [3 points]

  • Spec 7: The TreapMap.remove operation is correctly and efficiently implemented, maintaining Treap properties. [3 points]

  • Spec 8: The TreapMap.put and TreapMap.get operations are correctly and efficiently implemented. [1 point]

  • Spec 9: The BinaryHeapPriorityQueue constructors are correctly implemented. [1 point]

  • Spec 10: The BinaryHeapPriorityQueue.insert operation is correctly and efficiently implemented maintaining heap properties. [2 points]

  • Spec 11: The BinaryHeapPriorityQueue.remove operation is is correctly and efficiently implemented maintaining heap properties. [2 point]

  • Spec 12: The BinaryHeapPriorityQueue.best operation is correctly and efficiently implemented implemented. [0.5 points]

  • Spec 13: The BinaryHeapPriorityQueue iterator is correctly implemented. [1 points]

  • Spec 14: Comprehensive unit tests are provided for all implementations. [5 points]

  • Spec 15: The benchmarking discussion in the README provides insightful analysis. [1 point]

  • Spec 16: The analysis of the selection problem strategies is correct and thorough. [1.5 point]

  • Spec 17: The code is well-organized, readable, and follows coding standards. [1 point]

The total score for this homework is 33 points.