I've been ask this question dozens of times in the last 3 years and have tried to explain it several ways. The trick about RETE is it goes counter to programmer intuition. Of the two, most people seem to understand why Beta memory (join memory) is useful. More often than not, the question about Alpha memory is this "wouldn't that use more memory and make the rule engine slower?" The answer to that question is yes and no.
Yes, it would take more memory, but what happens when facts change? Lets think of it this way. I'll use a compliance scenario.
Rulebase: 400 rules
Accounts: 50K
Positions: 1million
distinct securities 64K
If the system needs to check firm wide compliance rules, it means loading 1 million+ rows of data. Assuming we have a big server with 10Gb of RAM, loading this much data isn't a problem at all. In fact, I've managed to load 1 million rows of data on 2Gb of RAM.
As the facts propogate through the network, what happens? In the case where alpha nodes do not store alpha memory, it's up to the join nodes to remember which facts matched. When we go to modify a fact, it has to re-evaluate all alpha nodes to figure out which join nodes stored beta memory. If you don't re-evaluate the alpha nodes, it means the rule engine has to check every join node for that given object and remove all entries.
If the rule engine stores the alpha node, what happens? When a fact is modified, the retract is propogated through the network. The first alpha node of each sequence of alpha nodes, checks to see if the fact is in its index. If it is, it removes the fact and propogates. If it isn't, that entire branch is skipped. This is what Forgy meant when he claimed Rete doesn't need to iterate over the facts.
Storing the alpha memory means the rule engine doesn't need to traverse every node for that object. In practice, this means it reduces the number of times facts are evaluated by 1,000-100,000 times compared to non-Rete rule engines. Having compared procedural rule engines vs JESS, as the dataset increases by 10x, the performance difference increases by 100-1000x. This means rules that take JESS 1 minute to evaluate can will take several hours using a procedural rule engine.
The new implementation of Rete for drools implements both alpha and beta memory. The alpha nodes simply use a list to remember which facts matched. The beta nodes use the alpha and beta memory to remember which facts have entered and matched. One important distinction is that facts that have entered a beta node doesn't necessarily mean it matched.
Hopefully, this provides a decent explanation of alpha memory and why it is important for scalability and performance.


