Skip to: Site menu | Main content

Drools

Java Rules Engine

Blogs Print

Blogs attempts to aggregate the various rule engine blogs available on the internet that might be of interested to Drools users and developers.

My hovercraft is full of eels

(The occasional public airing of my not so occasional rants)
I Suppose I should Be Flattered

I was doing a little reading on different types of rule engines today and stumbled upon this article. As I was reading through it, I had an eerie sense of dejavou; I was sure I had read this somewhere before. In fact, it almost sounded like something I might have written.

A quick search through my blog and there it was, an entry posted in September 2004.

The thing that amuses me most about copying someone elses stuff word-for-word is that you inevitably end up copying all their (or in this case my) mistakes as well. So whilst I'm flattered that someone with obviously unquestionable integrity would even consider re-using my thoughts (and even spruce them up a bit in the process) I would have thought that bloggers, armed with trackbacks and hrefs, and indeed developers in particular, would have left copy-and-paste re-use behind.

JRules Memory Leak Gotcha

About 6 months ago we were profiling our application to ensure we had no memory leaks, etc. We did find some and we were able to fix them pretty much immediately. However, today I happened to be chatting with a colleague who is investigating a memory leak in another application and it sounded scarily similar. So in the interests of all you JRules developers, here's a little gotcha.

JRules maintains a binary, 1-to-many, association between the rule set ( IlrRuleSet) and the, possibly, many instances of the working memory ( IlrContext) - Also referred to as a "rule engine" by the JRules documentation. I'll spare you my diatribe on binary associations for now, suffice to say that if you weren't aware of this little "feature" (or if you were and simply hadn't given it much thought) you're in for a nasty surprise.

When you're done with an IlrContext the natural thing to do would be to simply remove all application references to it and let it be garbage collected. Unfortunately, due to the two-way nature of the relationship, this doesn't have the expected effect. Instead, because the rule set still holds a reference to the context, it will NEVER be garbage collected.

To combat this problem, ILOG thankfully provided a somewhat innocuous looking method IlrContext.end(). To quote from the documentation:

Prepares this rule engine instance for garbage collection. After this call, the engine will not keep any reference to this rule engine instance. The rule engine instance will be detached from the ruleset and will no longer be notified of modifications on the rules. The rule engine instance will also disconnect all its tools and all the related resources will be released. If the application does not keep this object, it is then subject to garbage collection.




In other words, anytime you've finished with a context and wish it to become a candidate for garbage collection, be sure to call end() or be prepared for a slow and painful application death as the heap runs out.

One final tip, if you make use of context pooling, be sure to also call IlrContext.reset() before returning it to the pool. This will remove all references to your application objects within the context.

<blatant-plug>If you're in the market for a cheaper alternative, you might like to try out the latest version of Drools.</blatant-plug>

P.S. If anyone from ILOG is listening, this is exactly the kind of problem WeakReference s (and WeakHashMap s in particular) are designed to prevent :)

Beware The Cross-Product Join

An intersting discussion started on the Drools user mailing list regarding some problems writing a rule. The particular problem is not unique to business rules though. RETE-based inferences engines share much in common with relational databases and in fact this particular problem can affect SQL queries in the same way as it affects business rules.

Let's say we wanted to find all pairs of people that were maternal siblings (ie that had the same mother). In SQL we could write a query like this*:

SELECT * FROM Child c1, Child c2
WHERE c1.motherId = c2.motherId
          




If we imagine we have only two children in our database, Bob (childId = 1) and Mary (childId = 2), both having the same mother, this query would generate four rows:

  • Bob, Mary
  • Mary, Bob
  • Bob, Bob
  • Mary, Mary




This is called a cross-product; every row is joined to every other row. This results in rows we're not interested in: Bob, Bob and Mary, Mary. So the first thing we would do is try and ignore rows where the child was the same:

SELECT * FROM Child c1, Child c2
WHERE c1.motherId = c2.motherId
AND c1.childId != c2.childId
          




Which results in:

  • Bob, Mary
  • Mary, Bob




The next thing you'll notice is that we still have redundant rows - rows that mean the same thing. There are a few "tricks" to avoiding this and really come down to a knowledge of the underlying attributes of the tables involved. The simplest in our case is to change the condition:

SELECT * FROM Child c1, Child c2
WHERE c1.motherId = c2.motherId
AND c1.childId < c2.childId
          




By imposing an arbitrary ordering, we prevent rows being joined to themselves and ensure that for any two siblings, we only get one row. Best of all, this technique translates directly into the implementation of business rules.

Not only do cross-products produce redundant and possibly incorrect results, the extra tuples (rows) generated as a consequence can cause your rule engine to grind to a halt.

* I realise that no one is going to model Children and Mothers in different tables but please cut me some creative slack ;-)

Drools Schmokes! - Part II

So once we'd worked out what the major hot spot in drools was, it was time to find an alternative method of conflict resolution.

As a bit of background, in simple terms, as facts are asserted, new items (or activations) are added to the agenda. In the general sense, all agenda items are equal. But some are more equal than others.

Although you should stay away from attempting to infer or impose ordering on rules, sometimes it is necessary. Sometimes you just need a couple of "cleanup" or "setup" rules, that are guaranteed to fire before or after all others. In Drools (and JESS) this is known as salience. In JRules it's called priority.

There are other reasons to order the agenda and Drools has a number of different strategies: Random; Complexity; Load Order; etc. These are then chained together. Each Resolver then gets a chance to add the item to the agenda. If it succeeds, no more resolvers are called. If however the item conflicts with one or more existing ones, all are returned and passed to the next resolver to, well, resolve LOL.

Confused? Here's a better explanation.

Looking at the implementation it was apparent that the complexity was O(n^2). Each resolver seemed to be doing a similar thing. It was also optimised quite a bit meaning there was necessarily duplicated code.

My initial gut feeling was that a priority queue was what we needed but how would we do the chaining of the different concerns?

Maybe something like a Red-Black Tree would be useful. Maybe we could implement a comparator for each strategy. Conceptually at least, if we used the first comparator to insert into the tree until we found items that were equal. From then on we would continue to insert using the next comparator, etc.This seemed too complicated and I don't do complicated very well. Makes my head hurt.

It seemed that each of the strategies was really just using a different dimension or aspect of the item to perform a sort. It was like a composite key. So whats the easiest way to sort on a composite key? Use a composite comparator. Something like:

public class CompositeComparator implements
Comparator {
    private final Comparator[] _comparators;

    public CompositeComparator(List comparators) {
        this((Comparator[]) comparators.toArray(new
Comparator[comparators.size()]));
    }

    public CompositeComparator(Comparator[] comparators) {
        _comparators = comparators;
    }

    public int compare(Object o1, Object o2) {
        int result = 0;

        for (int i = 0; result == 0 && i <
_comparators.length; ++i) {
            result = _comparators[i].compare(o1, o2);
        }

        return result;
    }
}
          




I tried it out using a TreeSet but it performed just as badly. Maybe I was wrong I thought to myself. So I jumped online and chatted to some of the Drools guys, Mark Proctor in particular. I described my ideas and he seemed to like them.

We did a bit of searching around for implementations we could use. I found one here but the license wasn't right. Next we thought of Doug Lea's stuff but it was overkill. Finally Peter Royal suggested looking at the commons-collections stuff and voila, there it was - PriorityBuffer - and it took a Comparator!

Hackedy, hackedy, hack and we'd replaced the original stuff with the priority queue. Time to give it a whirl.

The first step was to run the queue with a simple Comparator. Although it doesn't really do anything much, it would at least allow us to see what the basic overhead of the queue implementation was:

public class ApatheticComparator implements
Comparator {
    public int compare(Object o1, Object o2) {
        return -1;
    }
}
          




Hit run. Damn that's quick! Once more to be sure. Yup. Hmmm. Still not convinced. Add a breakpoint and run in the debugger. Sure enough it's being called. Cool! Ok now to try LoadOrder and Salience.

public class SalienceComparator implements
Comparator {
    public int compare(Object o1, Object o2) {
        return ((Activation) o1).getRule().getSalience() -
((Activation) o2).getRule( ).getSalience();
    }
}
          




Same deal. All works just fine and after implementing a few more I was convinced that this was going to be a winner.

Now we have O(n log n). Even with all the comparators chained in, the peformance doesn't change one bit. What's more, the different strategies are simple one liners making implementing new strategies almost trivial!

So once more I must applaud the Drools guys for a flexible and performant design!

Drools Schmokes!

We're about to open source a new rule-based project and up until now, we'd been using various closed source rule engines to get us going. Of course this won't cut-it once we open source so we hoped that Drools would come to our rescue.

And it did. With some caveats, I can safely say that Drools is incredibly fast. Not bad for a code base that by their own admission has, quite rightly, favoured stability over performance and as such has had ittle or no profiling done.

Luckily we had built joodi, short for Java-Based O-O Design Inferometer (just had to get the word Inferometer into a project somehow!), test-first and as such the guts of the app was based on interfaces so cutting over to Drools was prety easy. It took me about an hour I guess to convert the application, rules, tests and all, to run with Drools. We fired it up. All tests passed. Hooray!How happy were we!?

Next to run a "benchmark". We ran the application over the struts classes using the closed source engine first and it finished in around 9 seconds. COOL! Performance had been one of our unknowns and this was certainly well within tolerences.

Then we switched over to Drools and run the same test. 20 minutes later it still hadn't finished. Another ten minutes I'd say and I was fast asleep. So when morning came around I lept up and ran into the lounge to see if it had finished. It had. In 78 minutes!!!

Yikes we thought. This aint going to cut it. Elation turned to dismay. But no real profiling of Drools had been done so surely there was room for improvement?

After a bit of chatting with the peeps in da haus, I decided to check-out the source and use JMP to do some profiling. Run it, we thought, find the lowest hanging fruit, fix it, then keep doing that until we've done all the obvious stuff.

So I cranked it up and it didn't take long to find a hot-spot. In fact it appeared that nearly 50% of the time was being spent in one small area - conflict resolution. A quick look at the source code was all that was needed to confirm my suspicions. Lots of unecessary iteration. But again, I'm not taking anyone to task over it. I'd rather it was stable and functional first.

Looking more closely at the code, I realised that the functionality provided by the classes under scrutiny were not actually necessary, yet, for me to get joodi running. Thankfully due to the thoughtful design it was pretty easy to stub out, without even touching the Drools source-code.

Time to run again...holy-cow! 5 seconds! That can't be right. Run it again. Nope 5 seconds again. Quick look at the output to verify it was actually working correctly. Yup. Run all the joodi unit tests just to be sure. Yup they run just fine. It had gone from being 300 times slower to almost twice as fast!

Damn I'll try running joodi against another, bigger, project - xerces. With Drools plugged in, joodi ran in around 9 seconds. With the closed source product I gave up after 5 minutes and stopped it.

So hats off to the Drools team. Damn fine job! I'll be submitting my patches ASAP and hope to see some of that other code re-factored soon :-)

Business Rules != Scripting

As Business Rules come into vogue (again?) and the tools proliferate, there will be the usual fumbling about as many come to terms with what it all really means. How do we use these things? What should I look out for, the pitfalls, the traps? Are there any "patterns"? But above all, the greatest difficulty it seems, is coming to terms with the idea that Rule Engines ARE NOT procedural scripting languages.

The Rete Algorithm (pronounced REE-tee and Latin for net) was developed by Charles L. Forgy at Carnegie-Mellon University in the 1970's and is used in most modern high-performance rule engines. Rete is able to efficiently handle very large numbers of rules.

One of the most important features of the Rete algorithm lies in its ability to identify and subsume rules with similar predicates. Because of this, predicates need only be evaluated once. This differs from procedural (java coded) rules where every predicate in every rule must be independently evaluated, regardless of whether the same predicate might already have been evaluated in another rule. It can also locate conflicting rules. Something that's almost impossible in traditional, procedural, languages.

When it comes to codifying business rules, well factored Java code can be rather difficult to understand. After a couple of weeks away, it can often take the original developer some time to get back up to speed with their own code, let alone someone elses. On the other hand, Rules are declarative statements of fact. That means no trudging through tens or even hundreds of lines of procedural code to understand what will happen under various conditions. Weeks, months or even years later you can go back to the rule definitions and immediately understand their meaning and intent.

Rule engines share much in common with Relational Databases. They are based on tuples and predicate calculus. You don't navigate Relations (Tables), you join them. Similarly you don't navigate facts, you join them. Both suffer (or at least have suffered) similar problems in terms of performance and optimisation.

Business Rules should be simple and atomic. They should make inferences. They should not be calling out to databases nor making countless remote calls. That's what application code is for. Much like the difference between queries and stored procedures.

Analogies aside, the fact remains (no pun intended) that rules are not procedural, they are declarative statements of fact! Writing business rules requires very clear, concise and logical thought, as much if not more so than procedural code.

Rule-flow, priority, salience, etc. are mechanisms that allow some degree of procedural control and should therefore be considered a last resort, not the basis for a rule engine framework. While sometimes useful, all are frowned upon by rule advocates in much the same way as OO design frowns upon public variables.

If you can't or won't make the necessary shift from a procedural to a declarative mind set then I suggest you try BeanShell, Rhino, Groovy or any of the myriad scripting languages available. There is nothing to be ashamed of with this approach but it is most certainly NOT the same thing.

Rule Engine Notifications

I was interested to see Martin Fowlers recent entry on Notifications. If you've ever used Struts (gasp) or similar "framework" you'll already be familiar with the concept so it's certainly nothing new but Martin has a fantastic ability to document and explain things in clear, unambiguous terms.

The most interesting thing to me was this statement "You should use Notification whenever validation is done by a layer of code that cannot have a direct dependency to the module that initiates the validation." and how this relates to the use of a rule engine within an application.

One of the biggest mistakes we've seen in using rule engines is to allow the business rules to become dependent on other than the business domain. For example, allowing rules to know or depend on what screen is currently displayed. Business rules should be statements of fact about business information not application workflow or navigation state. As much as possible, we want business rules to survive changes to the form, flow, layout and even number of application/s that depend on them.

Business rules make inferences about the business information (facts) presented to them. Some of these inferences will be new facts for other rules to consume and some will be facts for the caller to consume. It is this second class of facts which we classify as Notifications and that which the application collects and proccesses. At any point in time, some of these notifications will be relevant to the application and some will not. Some will cause the application to alter it's workflow, screen layouts, etc. and some may safely be ignored. The critical thing to understand is that it is the applications responsibility to filter the notifications.

For example, imagine we have an application that collects data on a customer. The data is collected over N (where typically N > 1) screens according to the business workflow requirements. After each screen of data is collected, the user hits Next to proceed at which time the state of the domain is asserted into the rule engine, the rules are executed, and the notifications processed. Now lets imagine that one of the rules states that a customers date of birth is required. You'll note there is no mention of a screen here meaning that until the date of birth is filled out, the application will receive a notification indicating some problem with that field. Rather than emed knowledge of the application into the rules, the application instead filters out any notifications that are not relevant. In this case by checking to see if the field specified in the notification exists on the curent page or not. If after filtering, there are no notifications, the application can proceed to the next screen; otherwise a message is displayed and the user cannot proceed. Finally on the last page, the application can check to ensure that there are no unfiltered messages before allowing the user to save. You can even get fancy and have the application take the user directly to the appropriate screen, something that would be difficult to achieve if the business rules were dependent on navigation state.

Another area we have used this approach is with authorisation. We have rules that assert Permissions (a type of Notification) based on the business data that the application can use to determine what a user is or isn't allowed to do. Again, the rules make no reference to screens or assume anything about the calling application for that matter. They simply state the facts as presented and inferred. The application then checks the results for the existence of the desired permission and if present the user is allowed to proceed; if not the user is denied access to that particular function. This also makes rendering links, enabling/disabling buttons etc. very easy while maintaining the ability to define the rules in purely business domain (ie application-agnostic) terms.

The concerns of a client application are typically to do with workflow and appropriate use of screen real estate. Business rules on the other hand are concerned with statements of fact about the underlying business data. Notifications allow the business rules and application workflow to vary independently according to these different concerns.

JSR-94 Not Useless But Certainly Trivial

I watch with interest as the Rule Engine chatter begins to increase. I truly believe it's an area much ignored by the great majority of developers.

If you're not aware, there is a JSR in the works to provide a common interface for integrating rule engines. In its current form, JSR-94 provides little more than a common interface for creating a context/rule engine and marshalling objects in and out. It is trivial to implement this yourself.

The fact is (pun intended) that the JSR provides little more than would result in developing a system that makes use of a business rules engine keeping in mind requirements for testability of rules and loose coupling (a.k.a. good design?). Only the JSR is considerably more verbose.

To illustrate, we have a large number of rules in our current system and a very small but useful set of interfaces:

public interface RuleEngineFactory {
    public RuleEngine createRuleEngine();
}

public interface RuleEngine {
    public void reset();
    public void addFact(Object fact);
    public void addFacts(Set facts);
    public void execute();
    public Set getFacts(Class type);
}

public interface RuleEnginePool {
    public RuleEngine getRuleEngine();
    public void returnRuleEngine(RuleEngine ruleEngine);
}
          




Add to these a few very light-weight implemenation classes (and some Dependency Injection for good measure) and you have pretty much everything you could need from an integration standpoint.

public class JRulesRuleEngineFactory
implements RuleEngineFactory {
    private final IlrRuleset _ilrRuleset = new IlrRuleset();

    public JRulesRuleEngineFactory(Reader reader) {
        if (!_ilrRuleset.parseReader(reader)) {
            throw new RuntimeException("Error parsing rules");
        }
    }

    public RuleEngine createRuleEngine() {
        return new JRulesRuleEngine(new IlrContext(_ilrRuleset));
    }
}

public class ThreadLocalRuleEnginePool implements RuleEnginePool {
    private final ThreadLocal _engines = new ThreadLocal() {
        protected Object initialValue() {
            return _ruleEngineFactory.createRuleEngine();
        }
    };

    private final RuleEngineFactory _ruleEngineFactory;

    public ThreadLocalRuleEnginePool(RuleEngineFactory
ruleEngineFactory) {
        assert ruleEngineFactory != null : "ruleEngineFactory can't
be null";
        _ruleEngineFactory = ruleEngineFactory;
    }

    public RuleEngine getRuleEngine() {
        return (RuleEngine) _engines.get();
    }

    public void returnRuleEngine(RuleEngine ruleEngine) {
        assert ruleEngine == getRuleEngine() : "ruleEngine not
allocated from the current thread";
        ruleEngine.reset();
    }
}
          




We theoretically have plugability of rule engines but to believe that this might be useful or even practical is naive at best.

Unfortunately, (or fortunately depending on your perspective) the biggest part of using a rule engine is in analysing and writing the rules themselves. Granted, many engines use a Rete Algorithm but to suggest that all Rete-based rule engines are the same is akin to saying that two Java applications are the same. JRules and JESS both use a Rete network and both are implemented in Java but the language, tools and behaviour (not to mention performance characteristics) of each differs sufficiently to render the conversion process rather less than trivial.

Surely, few of you would be imagine that a switch from using JSPs to say Velocity in a system of any significant size would be an overnight job. Similary, a switch from Struts to say Tapestry or JSF would be non-trivial. All of these technologies attempt to solve essentially the same problem but all come with a slightly different design philosphy. No matter how standardised the interface, if the behaviour of the system on the other side is different, the illusion breaks down.

Anything that lowers the barrier to entry for those wishing to explore the use of declarative rules is to be applauded however there are far more important problems for an organisation to solve than transparency of the underlying rule engine implementation. Atomicity, testing, managability, education, analysis, configuration, understandability, to list but a few. It is no coincidence that these are largely non-technical.

What Does Business Analysis Really Mean

Many years ago now I read the first edition of About Face by Alan Cooper. At the time, Dave and I were working on an HR application which was subsequently sold to a another company and is still being sold and (I presume) used, today.

Not to ding my own chain (much!) but it still rates as one of the best apps I've ever built. I'm sure if I looked at the source code these days I would whince but I still think we made some pretty good technical achievements. However technical merit aside the one thing that really made an impression on me and continues to make an impression was the usability of the application. Not only in terms of business functionality but also the way it simplified the way users performed their day to day tasks.

Alan Coopers most excellent insights were enlightening to me at the time and certainly influenced a lot of the design. But I can't take credit for the usability nor functionality of the application. No for that we have Dave to thank. Besides having a brain the size of a planet, he is an exceptional business analyst. "Oh we have really good business analysts" I hear you cry. I'm sure you do in which case you'll appreciate my definition of a business analyst.

Picking on Dave once again, he has an amazing ability to actually analyse a business. By this I mean try to really understand what the customer does; Why they do it; Determine if their current business practices even make sense; How their shiny new software might actually make their life easier; and; most importanty to convey to (AKA convince) the customer why his ideas will work. I've heard of CEOs walking away from meetings asking their staff how this guy knows so much about their business. And I know for a fact that he had very little prior knowledge. He just knows how to ask the right questions to get to the heart of their business.

Traditionally (though I have little data to suggest this isn't still the norm) business analysts will sit with the customers and essentially document what the customer does. Workflow, day to day tasks, etc. From this they then write story cards or use cases (whatever is flavour dejour) that form the basis of the application design. These then go to the developers who consult with the customers on what exactly needs to be done, screen designs, etc. and then off they go to build the software. Unfortunately, the net result is usually a computerised version of some ancient manual system that is barely better than what they had and in many cases worse!

Maybe it's because the skills of which I write are rare but I'm not sure where the notion that customers should design the software comes from . The idea that customers know what they need (or even want) is just plain ludicrous. In most cases, business people understand what drives their business. They understand what their competitive advantage is and where they could gain new business if only they could do X or had Y. Surely it is the BAs job, nay duty, to come up to speed with the business and from that explain to the customers what would make their life easier. Surely that is where BAs add value. They understand software AND the business.

Even as a developer, I see it as my responsibility to go and talk to the BAs and customers if I see inconsistencies or if I think the application flow or business rules can be improved. I'm sure their are those who wished Id shut up sometimes but I still think it's worth it. Which brings us right back to where we started. It's very rare that the end users can design a piece of software that actually does what they need but it's equally as likely that developers will, on their own, design totally unusable software. So go read the book :-).

Oh, and this paper if you have the time.

Well Behaved Rules

I have previously made a comparison between rule engines (and the RETE algorithm in particular) and SQL databases. Business rule languages are declarative as is SQL, both being based on predicate calculus. Both suffer (or at least have suffered) similar problems in terms of performance and optimisation.

I recall many years ago, tuning my queries within an in inch of their (or my more likely) life. Re-ordering the WHERE clause, changing JOIN conditions, even changing the order in which columns were returned.

Thankfully these days, even the simplest of SQL database engines have some form of optimisation built-in. High-end systems such as Oracle have very sophisticated optimisation techniques. I can pretty much write any old SQL (with caveats) and know that I'll get at least acceptable performance in most cases.

The RETE algorithm (and it's successor RETE-II) is amazingly good and rule engines have also come a long way but certainly not as much as ye-olde RDBMS. So there are still some things you need to consider when writing rules.

Without going into too much detail, the RETE algorithm builds a network of nodes representing the conditions of your rules and the matching facts. In general, the smaller the network, the better the performance.

The first thing to note is that any rules sharing common conditons are optimised into a single node. However, with many rule engines, this is sometimes only possible if the conditions are listed in the same order. So for any N rules having M conditions in common, order the conditions so that the first M are the same.

Now that your are conditions are in the same order, you'll be interested to know that the exact order is in fact important. Because each condition is like an SQL JOIN, you need to place the MOST restrictive conditions first. That is, place the condition that is LEAST likely to be matched FIRST. This is no doubt familiar to anyone who has ever tuned SQL.

Iimagine we're trying to find two people with the same parent. We could do this (JRules code examples, just ask me if you want to see JESS as well):

?a: Person()
Person(getParent() == ?a.getParent())
          




This has one glaring problem: It's essentially a cross product! So we need to fix it:

?a: Person()
?b: Person(getParent() == ?a.getParent())
evaluate(?a != ?b)
          




Now, as we've shown above, the number of conditions evaluated is also important. Anytime we can short-curcuit the conditions, we save ourselves another join. So once again, we can re-write our conditions:

?a: Person()
Person(?this != ?a; getParent() == ?a.getParent())
          




I've found these simple techniques can result in the difference between rules running in seconds versus OutOfMemoryErrors!

To be continued...

Business Rules Fallacy #1

Remember the good old days when Crystal Reports was going to save the world? That's right. By about now (2004) every user on the planet would be writing up their own ad-hoc reports straight from the database. All they needed was the database schema and away they would go.

What? You mean your business users aren't doing this? Really? Say it ain't so!

No, the truth is it never really eventuated the way we (the IT industry) had envisaged. End users just don't get fully normalised data structures. FWIW, most people I work with probably don't understand why 25,000,000 + NULL == NULL so how did we ever expect users to? Oh and let's not forget the IT manager who has enough knowledge of the system to be dangerous. He has a big picture of the database schema on his wall and knows just enough SQL to build queries that do multiple table scans over the millions of rows of inventory data, causing the DBMS to not so quietly tell every other use of the system to please get nicked :-) The plain fact is that writing reports typically requires as much help from IT infrastructure as to make it an IT task.

So now that Business Rules are looking more cylindrical with that silver sheen each day, some of us naively believe that our so called "Business Users" should be able to code up/modify rules for direct inclusion into a production system all by themselves. To me, this is an even bigger problem than reporting.

All the problems that plague end-user reporting apply. Users don't understand our lovely, normalised, domain model. They surely don't understand why they get a NullPointerExceptions when adding numbers. And when it comes to knowing the difference between and and or you can forget it!

What's worse is that Business Rules are used directly by the application to "reason" on appropriate behaviour. By comparison, with the exception of the table scan problem and of course any bad decisions that might be made based on incorrect data, reporting seems rather innocuous.

Now, if I said to the man (and in this case yes it is a man) who writes the cheques, hey how about we don't test any of this code before we put it into production, I'd get the sack immediately. And quite rightly so I might add. Application components interact in subtle and non-obvious ways that necessitate large-scale unit/functional and integration testing. Business rules are no different. Actually they can be worse. We have many years of collective experience managing essentially procedural languages such as Java, C, C++, etc. Most developers I know think a lisp is a speech impediment and surely wouldn't know a prolog if they tripped over one :-)

Just like with reporting, we can try going down the path of writing views and buulding neato tools to try and make this stuff more like human readable languages and structures but when it comes down to it, like reporting, business rules require about the same (if not more) intervention from IT departments when end users write them as when the developers themsleves write them. The "best" tools in the world won't solve real problems with allowing end users the ability to directly modify business rules. I mean, glasses aren't much good if the patient is blind.

Business rule maintenence is and should be an IT responsibility. However, the rules must be representable in a way that makes it easy for an end user to verify the translation from written/spoken languages. Appropriate use of the lower-level JRules or OPSJ languages makes this a reality without the need for tools to render rules from/to "plain english". With a tiny bit of coaching, our business users are finding they can understand the rules sufficiently to know when we've made a mistake. In fact this iteration, our business rep has indicated he'd like to try his hand at writing one. No points for picking the irony in that.

Business Rules Goodness - Continued

Continuing with the business rules examples thread, James looked at the examples of using logical to achieve a "compensating retraction" and asked me "so that's all very well and good but what happens if I'm not asserting a fact? What a happens if instead, I'm sending an email to my broker?"

My first reaction was that this is a separate problem. The fact that I was potentially sending an email based on the SellOrder seemed like an implementation detail that I didn't want cloduing the simple fact that, under certain conditions, I wanted to indicate my desire to sell.

After a little discussion, we came up with the following solution which, IMHO, elegantly maintains the atomicity of rules and the separation of concerns. I've taken some liberties with the syntax and I've not actually tried this in JRules but it does serve to demonstrate the concept:

rule BrokerInformedOnNewSellOrder {
    when {
        ?order: SellOrder();
        not SellOrderActive(order == ?order);
    } then {
        sendMessage("Sell stock");
        assert SellOrderActive(?order);
    }
}

          




As the name suggest, this simply informs the broker on any new SellOrder. The key here is that we introduce a new fact SellOrderActive. If we see an order that isn't active, we'll send a message and assert that it is now active.

(NB. sendMessage() isn't syntactically correct for JRules but you get the idea.)

Next we need to know what to do if the SellOrder is retracted (either explicitly or implicitly):

rule BrokerInformedOnRetractionOfSellOrder
{
    when {
        ?active: SellOrderActive();
        ?order: SellOrder() from ?active;
        not SellOrder(?this == ?order);
    } then {
        sendMessage("No longer sell stock");
        retract ?active;
    }
}

          




This rule says that anytime we think we have an active order but the SellOrder itself no longer exists, retract it and send another message to the broker indicating that we no longer wish to sell.

As usual, I'll re-write this rule in JESS. Thanks go to the creator Ernest Friedman-Hill for clarification on the exact syntax:

(defrule broker-informed-on-new-sell-order
?order <- (sell-order)
(not (sell-order-active ?order))
=>
(sendMessage "Sell stock")
(assert sell-order-active ?order))

(defrule broker-informed-on-retraction-of-sell-order
?active <- (sell-order-active ?order)
(not ?order <- (sell-order))
=>
(sendMessage "No longer sell stock")
(retract ?active))

          




More Business Rules Goodness

I've always loved the idea of rule-based applications but never really had the opportunity to build one. And I have to say I'm having a lot of fun using a rules engine on this project. Since we dumped the BAL in favour of the IRL in JRules, productivity has sky-rockected. FWIW, TDD and rules-engines are a perfect match (a fact I'll blog more about in the coming days). I'm in geek heaven! About the only thing I wish I had now was an IntelliJ plugin (like the AspectJ one).

So anyway, my intention is that step-by-step, I'll try and document my progress starting with a little of what I discovered today by way of a rather contrived example:

rule
MessageIsGeneratedOnSignificantStockMarketIndex {
    when {
        ?index: Index();
        evaluate(index.value > 3000);
    } then {
        assert Message("Today the stock market rose above the
psychological 3000 barrier");
    }
}

          




This example says that whenever the stockmarket index rises above 3000, we assert that it was a significant event. (Actually JRules has some other nifty stuff to do with associating timestamps with events but I'll blog about that another time.) The important thing to notice is the assert keyword. This asserts a new fact into the "knowledge base". This new fact will remain "forever" or at least until another rule retracts it.

Simple assertions such as this are great when you know that the asserted fact will always be true independent of the triggering condition. In the example, it will always be true that at some point in time, the stock market index rose to a significant level, even if the index drops again.

But what if we have a fact that only holds while the condition holds? In such a case, we'd need a "compensating" rule to retract the fact when the condition changed. This could get quite ugly. Thankfully, JRules provides a neat solution:

rule
SellOrderRaisedWhenStockValueReachesMinimum {
    when {
        ?stock: Stock();
        evaluate(?stock.value >= 30);
    } then {
        assert 
logical SellOrder(?stock);
    }
}

          




This rule says that we will place a sell order for any stock that rises to 30 dollars. The key difference here is the use of the logical keyword. This tells JRules that the assertion onlyholds while the triggering condition holds. That is, while the stock value is at least 30 dollars, the sell order remains. However, if the stock value drops below 30 dollars, JRules will automatically retract the fact for us. What's even better is that if the assertion of the SellOrder causes other rules to fire and therefore assert more facts, all those that were declared as logical will all be retracted as well. How cool is that?!

In our application, probably >99% of all rules will use the logical form of assert. This allows quite complex interactions between essentially independent rules.

If you find yourself having to structure your rules with priorities and worrying too much about the interactions between rules, it's likely your individual rules are doing too much. Ensure your rules should be as atomic as possible. Seperate "inference" rules from "do stuff" rules". And don't be tempted to simply change the state (ie property values) of existing facts. Instead, always assert new facts (as we have done in the examples above).

I thought I'd also show you the same rules using JESS.

(defrule
message-is-generated-on-significant-stock-market-index
(index (value ?value)) 
(test (> ?value 3000))
=>
(assert (message (text "Today the stock market rose above the
psychological 3000 barrier"))))

(defrule sell-order-raised-when-stock-value-reaches-minimum        
?stock <- (
logical (stock (value ?value)))
(test (>= ?value 30)) 
=>
(assert (sell-order (stock ?stock))))

          




You'll note that in JESS, the logical keyword is associated with the condition (or LHS) rather than the action (or RHS).

In many ways JESS provides a richer environment than JRules but I admit the syntax is less obvious to novice users.

BAL Cancer

Today I have to vent my spleen in the hope that I'll save some poor hapless souls from venturing down the frivolous path that is the JRules Business Action Language (BAL).

First up, a recap. If you haven't read any of my previous blogs on the subject, we're in the process of converting all our validation to JRules and we have some, IMHO, pretty cool stuff happening now but I'll blog about that another time. The rules are in a repository because realisticially the only way to write them using the BAL is using the repository (and associated tools). We need to use the BAL because it's the business user friendly language and without it most if not all the business case for using JRules in the first place goes out the window. I estimate that we would be in the order of 5 times more productive if we could write these by hand using IRL in a text file!

Having all our rules in the repository makes it almost impossible to be agile/iterative/whatever you want to call it. Forget having multiple developers modify them because the rules are not maintained on an individual basis (ie they're essentially all in one dirty great big file). Forget being able to apply domain refactoring (namely rename and move). And who the hell wants to write a custom Ant task to get the damn things out of the repository to do any kind of testing on them? Believe me when I tell you that we tried every which way to skin this little cat. Everytime we came up with a solution to one problem, another presented itself. It's not that we couldn't go ahead and write the rules mind you. It's just that we knew if we did we would be heading down a path that would ultimately violate the entire premise on which we had based our approach. The greatest minds on the planet couldn't have helped get around the plain and simple fact that the BAL should be classified as legally dead!

And so it was with great pleasure that we called in the project oncologist and removed the festering pustule that is BAL, hopefully, once and for all! From now on we will be maintaining the rules by hand in a text file (just like java). Because it's a text file, CVS and IntelliJ can do their merging magic (just like java). Because it's a text file, we can rename methods, classes, etc. with ease (just like java). The list goes on but I'll spare you :-)

The upshot is that I managed to get 2 days worth of (previously estimated) work done in around 30mins! Just to be sure, I passed around a print-out of the rules (now in IRL) to garner peoples opinion and EVERYONE agreed: It was good! Oh, and readable. Which was actually going to be my point :-) What's even better, is that almost straight away, people noticed a few mistakes in the rules that nobody had noticed when it was written in the "nice and fluffy" language.

I mean, when it comes down to it, the idea that end users will modify these rules is about as likely as the idea that end users can just create their own Crystal Reporst straight from the database - YEAH RIGHT!!! Even if they do want to, I reckon 30 mins of training would bring our business analysts up to speed with writing IRL. In the worst case, we can always convert rules from IRL to BAL on an as need basis if/when there is a requirement to do so.

In summary, JRules as an engine is great but the BAL sucks the big one! IRL IS readable (with a teeny bit of extra work) by end users and works great in an agile environment such as ours. BAL doesn't. It (BAL) might work once you have a mature application but we don't. Tools such as Eclipse and IntelliJ are making agile development that much easier. JRules IDE (as distinct from the engine which is just fine) is definitely a step backwards.

JRules - The story so far...

I wrote previously on my initial investigation into JRules. Having been using and playing with it in a production code environment for a bit now I thought I'd follow up with some observations thus far. Some of these no doubt relate to other rule engines such as Drools but as I've never used them in the real world, so to speak, I'll have to wait to hear back on that.

One of the biggest hurdles we had to face was the so called Business Object Model (BOM). This is a mapping from your domain model to user-friendly "natural language" expressions. Our first cut on this was to load up the model in the rule editor and start mapping away. This soon became tiresome, error prone and very, very brittle. Even slight changes in the heirarchy meant wiping out rules and starting all over again.

Next we tried generating the BOM directly from our domain model but that too smelled of maintenence nightmares. So instead, we simplified the whole process by effectively flattening the heirarchy. We represent as many facts as we can at the highest possible level by "exploding" the domain model (for rule execution only). We also assert services such as navigation state, current time, calculation engines, etc. in the same way.

At first this seems odd, especially to a hardened OO developer. But after some time it soon becomes apparent that it makes a lot of sense. To start with, it's much easier to do the mapping. We no longer need to worry about the relationships between classes and can focus on mapping the properties of individual classes. This makes our mapping much less susceptible to change. It also becomes much easier for the end users because they tend to think of pieces of information (facts) with less regard to relationships than we do. As in The customers address as opposed to the address of the customer of the order.

One huge bonus is that we no longer have to spend large amounts of time making the "nice" natural language mappings actually understandable. By that I mean tweaking the "translation" so that it reads in plain english. Because there is far less navigation going on, we decided we can probably not even worry about the language mapping until later, making it much faster for developers to work.

As an aside, I find it incredibly annoying that JRules doesn't seem to allow you to reference constants ( public static final fields). Instead we are forced to create virtual methods for each of them with a translation back to the constant. DUH! Please someone tell me it aint so.

Don't allow the rules to hit the database if at all possible. Not something we even entertained but another project in the same building is doing this and what do you know, it performs like crap. Relatively speaking that is. We're using Hibernate so the intention is to have all our reference data modelled as objects, cached and asserted just like all the other facts.

The cool thing is that all this is kinda like good old IoC/Dependency Injection/Whatever it's called this week. Nomenclature aside, you tell the rules everything they need to know. They don't call out to services by calling a ServiceLocator. They don't call static methods, etc. Instead they simply declare they need a particular service or are interested in a particular fact, and the rule engine does the plugging. Neat, and for all the same reasons that IoC is good for "normal" java code.

IlrRuleset and IlrContext are relatively expensive to create so we are using a very simple pooling mechanism to manage them. Don't forget to reset the context when it's returned to the pool or it'll hold references to objects that you probably no longer care about.

Another thing we found was that it's best that the application not worry too much about what rules may/may not be applicable. Similary, don't try and put too much "behaviour" into the rules. It's tempting to want to tell the rules engine (either explicitly or via some facts) that certain rules don't apply. The problem here is that we start to embed knowledge of our application into the rules and vice-versa.

Let the rules do their job. Namely, assert and retract facts based on certain conditions being met (or not as the case may be). The application is then responsible for performing some kind of action based on the state of the facts after firing the rules. This is probably somewhat controversial as it would be totally possible to implement an entire system in rules. (Actually, not only would it be possible but it would be pretty cool too). However in our case, and no doubt in yours, the rules are not application specific. In our application they represent company-wide knowledge about the way they do business. In fact the longer term plan is to publish these rules as an enterprise accessible service.

And finally, the non-deterministic execution of methods on your domain model during rule evaluation. By this I mean you can't guarantee that a given method will be invoked once or many times (or at all?) and especially not in what order. This isn't usually a cause for concern as you will probably be calling simple getters but in the few instances that isn't the case beware. Even something as innocuous as obtaining the current time can be an issue. More often than not, you want to treat the rules as a batch and to be evaluated and fired against a fixed point in time. So for example, we assert a Clock implementation that returns the same value whenever (ie. no matter how many times) getCurrentTime is called.