Sunday, January 19, 2014

Wednesday, January 1, 2014

What does the Reactive Revertible Library do?

With this blog, I'm going to introduce to the world a software library that I've been working on for about 18 months now.  I call it the Reactive Revertible Library (R2L).  This first post will give a basic introduction to what the library's capabilities and benefits are.

Reactive

The Reactive Revertible Library supports a reactive programming.  Reactive programming is oriented around data flows and change propagation.  It's an idea that has been garnering a lot of interest lately, with initiatives such as Reactive Extensions for .NET, but it is still mainly an academic idea at this point.

As a relatable analog, think about formulas in Excel.  One can enter a formula and see the result on the screen.  If any values that the formula uses changes, then the result of the formula is updated on the screen automatically.  Programmers rarely have this luxury, however.

When a value changes in the data model of a program, for example, there may be many things that need to be updated: the GUI may need to be refreshed, business rules may need to be re-evaluated, or a business process may need to be resumed (if it was waiting on that particular change), or the updated value may need to be sent and stored in a remote location.  But, the programmer usually has to manually specify what, if any, steps to perform when a value is updated and what order those updates occur in.  This would be like an Excel user having to go to each cell and telling it to recalculate its value.  Ensuring that all proper updates occur in the correct order when a value changes is a non-trivial problem that is the source of many bugs and often obstructs the modularity of the program.

Deficiencies Of The Observer Pattern


Traditionally, the problem of change propagation has been addressed using the Observer Pattern, which promotes modularity by decoupling the object that owns the data from the objects that need to be notified when the value changes.  However, there are many deficiencies to this pattern, many of which well documented in the white paper Deprecating The Observer Pattern.  There are a few specific problems I'd like highlight, however:

Coupled With Specific Technologies - Most libraries that support the Observer Pattern depend upon a technology designed for a specific problem-domain.  For example, Windows Presentation Foundation (WPF), Microsoft premier GUI technology, supports binding expressions, for instance.  However, if you need binding expressions that aren't related to the user interface, this capability will be of limited use.

Creates Lots Of Redundant, Boiler-Plate Code - In the observable pattern, an object is responsible for managing subscriptions and propagating events for all the properties it has.  Since the list of properties vary from class to class, the code for doing this cannot simply be encapsulated and reused. Instead, each class must explicitly include code for managing subscriptions and propagating events for each individual property.

Limited Expressiveness - Generally, the Observer Pattern only supports observing an unmodified property belonging to an invariant object.  Monitoring for changes of derived values (e.g., formulas) is difficult.  In cases where libraries do support observing derived values, it usually depends on a domain-specific language that has very limited expressiveness.  WPF's binding expressions support property chains, for example, but observing something as simple as the sum of two values requires the creation of a whole new class that then has to be clumsily integrated into the XAML code, where usually a simple '+' sign would normally do.

Forces Inversion Of Control - Perhaps the greatest sin of the Observer Pattern, and event-handling code in general, is that is forces inversion of control, which forces one to break apart control-flow logic into multiple, independent event handlers.  To achieve this, shared state between the event handlers must be promoted to class variables (where local variables would normally do), and each event handler must be manually crafted to determine what the current state of the control flow is (e.g., if I'm interested in drag events and the mouse is moving, has the "mouse down" event already occurred?) before the appropriate response to the event can be determined.  This is much more complex and error-prone than if the control flow was in a single method, and the resulting code is much more difficult to follow and understand (see Deprecating The Observer Pattern for more on this point).

Order Dependencies Between Notifications - This is probably the most difficult problem of the Observer Pattern to work around.  In the Observer Pattern, the order in which observers are notified of changes is undefined.  However, in some cases, the order in which changes are propagated matter.  For instance, say that an observable value B depends on another observable value A, and that observer C depends on the values of both B and A.  If the value of A changes and observer C is notified before A, then its possible that observer C will use an out-of-date version of C, a situation that the Deprecating The Observer Pattern terms a "glitch."  Depending on what observer C does, receiving this notification before other related values are updated could cause incorrect behavior.  Even if the programmer sorted out the correct order to execute the change notifications, the Observer pattern offers no way to force the event handlers to be called in that order (again, see Deprecating The Observer Pattern for more on this point).

Beyond The Observable Pattern


So, how does my library address these shortcomings?  The Reactive aspects of my library are very similar to the library described in Deprecating The Observer Pattern.  However, it does differ slightly in some implementation details which aren't particularly important.  Going point-by-point:

Coupled With Specific Technologies - rather than being created as a part of another library to address the data propogation needs of that particualr library (a GUI library, for example), this library is designed to work stand-alone, which can then be integrated with any other library, regardless of the problem-domain.

Creates Lots Of Redundant, Boiler-Plate Code - R2L treats observable values as stand-alone objects, rather than requiring a parent class to manage change notifications.  So, the code for managing changes can be defined once, in the class Signal and its various descendants.  A class that needs a property of type T to be observable simply declares it as a property of type Signal[T] or one of its descendant classes (Var[T], for example, creates a observable property that can also be assigned).

Limited Expressiveness - Signal[T] is a monad.  This basically means that any expression I could apply to a value of type T and get value of type U, I can apply to a value of type Signal[T] and get a Signal[U].  This is the same as how I'm able to use the Select method in .NET to apply an expression to each member of a collection to get a new collection.  So, if I want to observe the absolute value of a signal, I can simply write mySignal.map(Math.abs(_, 2)), and get back a signal that lets me observe exactly that.  I can also create a signal for an abitrary expression, such as the sum of two numbers, using the syntax Signal{ intSignal1() + intSignal2() }.

Forces Inversion Of Control - With the rich set of combinators for events (i.e., a toolbox of functions comprising an expressive DSL for combining and manipulating events) use of lambda expressions, and the revertible capabilities (introduced in the next section), the requirement for explicit lifting and inversion of control is greatly reduced, if not fully eliminated.  To ensure that inversion of control is never needed, cps transforms, as described in Deprecating The Observer Pattern, can be introduced, which is not yet implemented in this library.  I have yet to run into an situation where use of inversion of control in my library was actually necessary, however.

Order Dependencies Between Notifications - My library tracks and analyzes dependencies between observable values and ensures that event handlers are executed in a coherent order so that "glitches" do not occur.  The programmer doesn't even need to specify the correct execution order.

Revertible

Despite the useful reactive capabilities of this library, its key innovation is the introduction of the concept of revertible side-effects and an expressive embedded DSL for defining conditions that are based on timing, events, and boolean expressions.

Side-effects in a complex program are difficult to reason about.  Indeed, as has been pointed out, it is one of the major obstacles to parallelization of imperative code.  For this reason, some have taken to adopting a functional style of programming, which avoids side-effects all-together by imposing immutability of all data.  Unforunately, complete immutability is often counter-intuitive and unnatural for programmers.

Using this library, one can write in a more declarative style that is easier to reason about, without completely eliminating mutability, by writing the program in terms of of conditions and side-effects.  Rather than writing side-effects as an explicitly sequenced series of steps, you define under what conditions a side-effect should be applied, and the library ensures that the side-effect is applied once those conditions are met.  Perhaps more importantly, if the condition is no longer met at some time further down the road, then the side-effect is "reverted" or undone.  So, in this way, one can easily define a kind of invariant: one knows from a statement in the program that a given side-effect is applied when a certain condition is met, and that holds throughout the life of the program.

So, far this discussion has been rather abstract, so lets look at some concrete examples.  First, let's define some basic operations.  Two of the most basic operations relating to conditions and side-effects are once and until.

Once


A call to the once operation uses the following syntax:

once(wait-condition) {
  side-effect*
}

What this statement says is that, once the wait condition is met, then the side-effect is applied.  A WaitCondition[T] is a type defined in the library.  It is similar to an event except that it can only occur once and it has a state.  So, if a wait condition occurs before it is used in a once statement, the wait condition remembers that and when this once statement is reached the side-effect will be applied immediately.

To get an intuitive understanding of these concepts, let's model a real-life example.  Let's say a person having a birthday might be modeled as an event, because birthdays happen more than once.  A specific person having a 10th birthday could be treated as a wait condition, however, because a person can only have one 10th birthday.  So, let's say that congress passes a law that says all US citizens who are 10 or older must be immunized against some new virus.  You might model that in code like so:

for(person <- usCitizens) {
  once(person.birthdays.forAge(10)) {
    immunizeAgainstNewVirus(person)
  }
}

When this code is executed, it ensures that everyone who has already reached the age of 10 is immediately immunized.  Additionally, it ensures that once anyone reaches the age of 10 in the future, that they are immunized at that time.

Until


The until operation uses a similar syntax.

until(wait-condition) {
  side-effect*
}

In the case of until, the wait condition is first checked to see if it has occured yet or not.  If it has, then nothing is done.  If it has not, however, then the side-effect is immediately applied.  However, once the wait condition has occured, then the side-effects are reverted (hence the word "revertible" in the library's name).

To model this, let's say that the immunization is only good for ten years.  We might model that in the definition of the immunizeAgainstNewVirus method as follows:

def immunizeAgainstNewVirus(person : Person) {
  until(currentTime.timePasses(TimeSpan.years(10))) {
    person.isImmune := true
  }
}

The := is a special assignment operator that is revertible.  The above code snippet effectively sets the isImmune flag to true for the duration of 10 years, after which the isImmune flag will revert to false (unless some other side-effect sets it to true).

Only side-effects that have been coded to support reversions are revertible.  The library provides a host of such side-effects, and additional side-effects can be implemented fairly easily.  All other side-effects are permanent, so they will stand even after until's wait condition occurs.  If a call to until includes some side-effects that are revertible and others that are not, then the side-effects that are revertible will be reverted and the others will remain permanent.  Additionally, a revertible side-effect can be converted into a permanent one (more on that later).


Embedding Until & Once


once and until are themselves side-effects, so they can themselves be reverted.  Thus, one can embed calls to once and until inside each other to produce more complex behavior.

Now, it's possible that congress might decide to repeal their law sometime in the future.  We can model this easily by simply surrounding a previous code in a call to until.

until(congressRepealsLaw) {
  for(person <- usCitizens) {
    once(person.birthdays.forAge(10)) {
      immunizeAgainstNewVirus(person)
    }
  }
}

This means that, until the congressRepealsLaw wait condition occurs, the library will ensure that everyone who turns the age of 10 is immediately immunized.  After the law is repealed, anyone turning 10 will not be immunized.

Permanently


But, there's a problem, because the call to immunizeAgainstNewVirus is also inside the until statement.  People don't loose their existing immunity just because congress repealed the law!  To fix this, we have to make the immunization permanent in the sense that one is "permanently immunized for the next 10 years," regardless of whether the conditions under which they were originally immunized have changed.  To do this, we have to convert our revertible effect into a permanent one.  To do this, we simply use the permanently operator, like so:

def immunizeAgainstNewVirus(person : Person) {
  permanently {
    until(currentTime.timePasses(TimeSpan.years(10))) {
      person.isImmune := true
    }
  }
}


Note that the "until ten years" clause appears inside the permanently scope, so it is not overridden to be permanent.  The permanently clause only overrides conditions that occur outside its scope.

Comparison

To show the advantages of the revertible capabilities of this library, let's compare the above example with what it would take the implement the same behavior in my library without using the revertible capabilities.

First, the complete listing of our example so far:

class Person {
  val isImmune = new Var(false)

  ...
}

def immunizeAgainstNewVirus(person : Person) {
  permanently {
    until(currentTime.timePasses(TimeSpan.years(10))) {
      person.isImmune := true
    }
  }
}

until(congressRepealsLaw) {
  for(person <- usCitizens) {
    once(person.birthdays.forAge(10)) {
      immunizeAgainstNewVirus(person)
    }
  }
}


Now, the same behavior implemented without using revertible capabilities (other than for the items explicitly shown, assume that the Person class between the two examples is identical):


class Person {
  private val immunizedUntil : Option[Date] = None

  def immunizeUntil(newDate : Date) =
    person.immunizedUntil match {
      case Some(currentImmunizedUntilDate) =>
        person.immunizedUntil = currentImmunizedUntilDate max currentDate.addYears(10)
      case None =>
        person.immunizedUntil = currentDate.addYears(10)
    }

  def isImmune =
    lastImmunization match {
      case Some(immunizedUntilDate) => currentDate < immunizedUntilDate
      case None => false
    }

  ...
}


def immunizeAgainstNewVirus(person : Person) {
  person.immunizeUntil(currentDate.addYears(10))
}

for(person <- usCitizens) {
  if(person.currentAge >= 10) {
    immunizeAgainstNewVirus(person)
  } else {
    //event handler for when a person's birthday occurs
    def immunizeIf10YearsOld(age : Int) {
      if(age >= 10) {
        immunizeAgainstNewVirus(person)
        //we only want to immunize them once
        person.birthday.unsubscribe(immunizeIf10YearsOld)
      }
    }
    person.birthday.subscribe(immunizeIf10YearsOld)
    //add an event handler to remove the event handle we just
    //subscribed once congress repeals the law
    congressRepealsLaw.triggered.subscribe(_ =>
      person.birthday.unsubscribe(immunizeIf10YearsOld)
    )
  }
}

Removing use of revertible side-effects caused two major changes to the code:

Firstly, I had to explicitly model the the immunization expiration dates, since this can no longer be handled by the until operator.  Because one's immunity might come from multiple sources (perhaps certain medications can give you a shorter-term immunity), I had to write code check the current expiration date and make sure I'm not overwriting a longer-term expiration date with a shorter term one.

Secondly, I not only had to register event handlers to perform checks each time they had a birthday, I had to set up an event handler to de-register those handlers in the case the congress repeals the law.  Because I need to keep a reference to the event handler so I can de-register it later, I am precluded from using lambda expressions.  Also, note that there's multiple conditions under which I need to unsubscribe (in the case of the repeal of the law or in the case that the person is immunized), and that the call to immunizeAgainstNewVirus also had to be duplicated (once to be called in the case the person is already 10 and once in the birthday event handler).

I think that this demonstrates that the second example is longer, less amenable to changes, and harder to read and understand than the version which uses the revertible capabilities.