Handout

assignment7-actor-binary-tree.zip last updated: 28th April, 2016 10.40am

Assignment Setup Instructions

Copy your local repository directory of assignment 5 or 6 you to a new directory (say parcon-groupXX-assign7), and use the following commands to initialize the new directory.

git checkout --orphan actorbintree
git rm -rf .

(Note the dot at the end of the second command.) You may now download the handout, and copy the handout files into the new directory. As before, make sure that you do not create any sub-folders during the extraction. For instance, if your repo is /home/user/myrepo and the handout has files: src, and build.sbt, extract it so that /home/user/myrepo/src is the location of src, and /home/user/myrepo/build.sbt is the location of build.sbt. Once this is done, run the following commands

git add src
git commit -m "Initial Commit."

As before, you can commit your changes using: git commit -a -m "Informative message about what was done. You can change this!". To push your changes to the remote repository use

git push origin actorbintree

Running Online Tests

Go to the grading interface: https://larasrv03.epfl.ch/
Go to the Tests tab of the interface and choose Assignment 7 from the drop-down list titled Testing.
You will see your commits, and can run tests on them.

Problem Statement

Binary Trees

Binary trees are tree based data structures where every node has at most two children (left and right). In this exercise, every node stores an integer element. From this we can build a binary search tree by requiring for every node that

values of elements in the left subtree are strictly smaller than the node's element
values of elements in the right subtree are strictly bigger than the node's element

In addition, there should be no duplicates, hence we obtain a binary tree set.

Your task in this assignment is to implement an actor-based binary tree set where each node is represented by one actor. The advantage of such an actor-based solution is that it can execute fully asynchronously and in parallel.

The API

You can find the message-based API for the actor-based binary tree to be implemented in the supplied BinaryTreeSet object in the file BinaryTreeSet.scala.

The operations, represented by actor messages, that the implementation should support are the following:

Insert
Remove
Contains

All three of the operations expect an ActorRef representing the requester of the operation, a numerical identifier of the operation and the element itself. Insert and Remove operations should result in an OperationFinished message sent to the provided requester ActorRef reference including the id of the operation. Insert and Remove should return an OperationFinished message even if the element was already present in the tree or was not found, respectively. Contains should result in a ContainsResult message containing the result of the lookup (a Boolean which is true if and only if the element is in the tree when the query arrives) and the identifier of the Contains query.

Handling of Removal

You should observe that both the Insert and Contains operations share an important property, namely, they only traverse a linear path from the root of the tree to the appropriate inner node or leaf. Since the tree nodes are actors which process messages one-by-one, no additional synchronization is needed between these operations. Removal in a binary tree unfortunately results in tree restructuring, which means that nodes would need to communicate and coordinate between each other (while additional operations arrive from the external world!).

Therefore, instead of implementing the usual binary tree removal, in your solution you should use a flag that is stored in every tree node (removed) indicating whether the element in the node has been removed or not. This will result in a very simple implementation that is concurrent and correct with minimal effort. Unfortunately this decision results in the side effect that the tree set accumulates “garbage” (elements that have been removed) over time.

Garbage Collection

As we have seen, removal of entries can be implemented simply by using a removal flag with the added cost of growing garbage over time. To overcome this limitation you will need to implement a “garbage collection” feature. Whenever your binary tree set receives a GC message, it should clean up all the removed elements, while additional operations might arrive from the external world.

The garbage collection task can be implemented in two steps. The first subtask is to implement an internal CopyTo operation on the binary tree that copies all its non-removed contents from the binary tree to a provided new one. This implementation can assume that no operations arrive while the copying happens (i.e. the tree is protected from modifications while copying takes places).

The second part of the implementation is to implement garbage collection in the manager (BinaryTreeSet) by using the copy operation. The newly constructed tree should replace the old one and all actors from the old one should be stopped. Since copying assumes no other concurrent operations, the manager should handle the case when operations arrive while still performing the copy in the background. It is your responsibility to implement the manager in such a way that the fact that garbage collection happens is invisible from the outside (of course additional delay is allowed). For the sake of simplicity, your implementation should ignore GC requests that arrive while garbage collection is taking place.

Ordering Guarantees

Replies to operations may be sent in any order but the contents of ContainsResult replies must obey the order of the operations. To illustrate what this means observe the following example:

Client sends:

Insert(testActor, id=100, elem=1)
Contains(testActor, id=50, elem=2)
Remove(testActor, id=10, elem=1)
Insert(testActor, id=20, elem=2)
Contains(testActor, id=80, elem=1)
Contains(testActor, id=70, elem=2)

Client receives:

ContainsResult(id=70, true)
OperationFinished(id=20)
OperationFinished(id=100)
ContainsResult(id=80, false)
OperationFinished(id=10)
ContainsResult(id=50, false)

While the results seem “garbled”, they actually strictly correspond to the order of the original operations. On closer examination you can observe that the order of original operations was [100, 50, 10, 20, 80, 70]. Now if you order the responses according to this sequence the result would be:

Insert(testActor, id=100, elem=1) -> OperationFinished(id=100)
Contains(testActor, id=50, elem=2) -> ContainsResult(id=50, false)
Remove(testActor, id=10, elem=1) -> OperationFinished(id=10)
Insert(testActor, id=20, elem=2) -> Insert(elem=2, id=20)
Contains(testActor, id=80, elem=1) -> ContainsResult(id=80, false)
Contains(testActor, id=70, elem=2) -> ContainsResult(id=70, true)

As you can see, the responses the client received are the same, hence they must have been executed sequentially, and only the responses have arrived out of order. Thus, the responses obey the semantics of sequential operations – it is simply their arrival order is not defined. You might find it easier for testing to use sequential identifiers for the operations, since that makes it easier to follow the sequence of responses.

You might also note that out-of-order responses can only happen if the client does not wait for each individual answer before continuing with sending operations.

While this loose ordering guarantee on responses might look strange at first, it will significantly simplify the implementation of the binary tree and you are encouraged to make full use of it.

Your task

You can find code stubs in the file BinaryTreeSet.scala which provides you with the API as described above, the BinaryTreeSet and BinaryTreeNode classes. The BinaryTreeSet represents the whole binary tree. This is also the only actor that is explicitly created by the user and the only actor the user sends messages to.

You can implement as many or as few message handlers as you like and you can add additional variables or helper functions. We provide suggestions in your code stub, marked with the comment optional, but you are free to use it fully or partially; the optional elements are not part of the tested API.

To see a binary tree in operation check our provided tests in BinaryTreeSuite.scala. Note in particular that it is the user who triggers garbage collection by sending a GC message (for the sake of simplicity of this exercise).

Don't forget to make sure that no Operation messages interfere during garbage collection and that the user does not receive any messages that may result from the copying process.

The following may be useful for your implementation:

Another way to stop an actor, besides the stop method you have seen, is to send it a PoisonPill message.
context.parent returns the ActorRef of the actor which created the current actor (i.e. its parent).
If you see a log message like the following

[INFO] [11/21/2013 14:04:13.237] [PostponeSpec-akka.actor.default-dispatcher-2] [akka://PostponeSpec/deadLetters] Message [actorbintree.BinaryTreeSet$OperationFinished] from Actor[akka://PostponeSpec/user/$e/my-actor#-1012560631] to Actor[akka://PostponeSpec/deadLetters] was not delivered. [1] dead letters encountered.

it means that one of your messages (here the OperationFinished) message was not delivered from actor my-actor to actor deadLetters—the latter is where actors forward their messages after they terminate. You should check that you do not stop actors prematurely.

A word on Actor counts:

The grader verifies that enough Actors are created when inserting elements.
The grader also verifies that enough Actors are stopped in response to a GC command (i.e. for those elements that were previously marked removed from the set).