Michael Oara
CEO, System Renewal, Inc.
Abstract
Harvesting business rules from a legacy application proved to have many benefits, from simple documentation to legacy modernization. Rule mining may use a number of approaches, as well as combinations of them. In most approaches, business rules are recovered from the code in a bottom-up fashion. This creates difficulties in classifying them and understanding the context in which they are used. This article suggests a practical approach for collecting rules in a process context. Process information is collected top-down, rules are collected bottom-up, but there is a way to connect process and rule such that the analyst could create a better picture of the business functions of the application.
Harvesting business rules from a legacy application proved to have many benefits. In the code of a classical legacy application the business and technical aspects are in most cases mixed together, in a way that makes it very hard to distinguish the technical mechanisms from the implementation of the company’s business rules and processes. Collecting the real business rules in a methodical and if possible exhaustive manner will render many advantages.
Understanding the benefits is just half of the equation, the other half being the capacity to efficiently and accurately mine the rules from the application. Various approaches exist, starting with user interviews and ending with automatic mining from the code. A key point of rule mining is a proper classification of the resulting rules. Absent such a classification, the rules will sit in an amorphous collection of little use.
Classification may be defined across many independent dimensions. We may talk about validation vs. computation rules, client vs. server rules, or simple vs. compound rules. One problem with these classifications is that they do not answer a very basic question: which are the business circumstances in which a rule is invoked – in other words, we would like to know the relationships between individual rules and the processes of the enterprise.
To address this issue, we’ll look first at the common ways in which rules are collected.
Rule mining may proceed in a manual or automatic fashion, each method having advantages and disadvantages.
In an automatic approach, some specialized rule harvesting software tool would perform queries against the application code and attempt to locate the fragments of code that implement business rules. This is quite a tricky process, as sometimes it’s hard to create a clear demarcation between the “business rule code” and code that simply creates the environmental mechanisms needed for the program to execute correctly. As the two may be easily confused, it is possible to obtain a large number of false positives (code that is mistakenly designated as a business rule implementation) as well as to miss important rules. The antidote to this weakness is to use flexible search criteria, which can be manipulated by an analyst and at the same time reflect particulars of the application. As an example, the analyst may notice that in a particular application all input validation rules reside in paragraphs or routines that contain the string “-EDIT-“ thus providing a good search criteria.
Searches through application code cannot be simply text based. Ideally, to zero in the business rules implementation code, some more sophisticated searches are needed, which are based on the particular grammar of the language of the code. The examples bellow illustrate the power of such syntax based queries, in a Cobol/CICS program:
Screen
validations
This query helps find all tests against variables that receive values from a screen.
This query finds al tests that have as a result the invocation of a program that displays all available item types
In a manual approach, all stakeholders of the application may be interviewed and solicited to list and explain the rules of which they are aware. Rules may also be collected from existing documentation or from other sources.
Manual and automatic methods for rule mining have each have their advantages and disadvantages, as described in the table bellow:
|
|
Advantages |
Disadvantages |
|
Manual |
Rules are expressed in a clear business language
Rules not enforced by the application are also discovered, opening the opportunity to improve the application |
Rules implemented in the code may be missed
There is no information about the actual location of a rule’s implementation
Rule collection process is long, costly end inefficient. |
|
Automatic |
Implementation code of each rule is identified and recorded.
Rule collection is fast and efficient.
|
False positives – some technical mechanisms are mistakenly identified as rules
Missing rules – in case the search criteria are not enough refined or sophisticated
Clear business description of rules cannot be automatically determined |
As the two methods complement each other, it is natural that some combination of the two is the best strategy to collect the rules. In such an approach, an automatic mining is performed first, followed by a manual review, refinement and additional specification. The usage of both methods (automatic and manual) insures that false positives are removed, and the rules are properly documented, with references to actual implementation in the code.
While some degree of automation is necessary in order to make a rule harvesting cost efficient and practical, it has the disadvantage that rules are collected without a clear process context. The analyst may discover that declares “a 10% discount is given for all orders over $100,” but there is little or no knowledge about the circumstances in which the rule comes into play. Such “circumstances” are best described as use cases or activities that could be specified, for instance, in UML diagrams.
One may observe certain symmetry between business rules and processes. Business rules are declarative (ex: “customer must be over 18”); process diagrams are prescriptive (ex: “receive payment, then send order”). Business rules are implemented in relatively short fragments of code; processes are implemented through series of programs and user interfaces. For all these reasons, it is natural that processes are collected “top-down,” while rules are collected “bottom-up.” An analyst would start by describing the main use cases and then detail them in activity diagrams. On the other hand, business rules are discovered at a lower level, as detailed policy implementations.
|
|
Processes |
Business Rules |
|
Nature |
Prescriptive |
Declarative |
|
Implementation |
In execution flows |
In fragments of code |
|
Discovery |
Top down |
Bottom-up |
Just as business rules may be harvested from the application code following a particular methodology, processes may be harvested from the application in a similar fashion. A combine approach in which the analyst looks at the existing application artifacts and combines that knowledge with descriptions obtained from the application stakeholders would render the most complete picture. The benefit of code analysis (manual or with the aid of some automation) is that discovered processes may be always related back to the code artifacts that implement them.
The
symmetry between processes and rules gives us a picture that begs for a linkage
between the two aspects of the application. The linkage between these two
provides additional information about both the business processes and the
application that implements them.
We can formulate this linkage in these terms:
Rather then a simple activity diagram or a flat collection of business rules, one would wish to see more a diagram as this.

Both the process modeler and the business
rules modeler would benefit by this link.
The process analyst would be able to answer the question: “What are the important details of the implementation of this process?”
The business rules analyst would be able to answer the question: “When is this rule used?”
As we come to the main point of this article, we assert that harvesting business rules and processes from code makes the linkage between the two easier to accomplish. Each process as well as each rule would have an attribute that describes their location in the code.
|
Rules |
||
|
Business Rule |
Program |
Lines |
|
BR1 |
P1 |
810-815 |
|
BR2 |
P1 |
890-900 |
|
BR3 |
P2 |
1320-1325 |
|
Processes |
||
|
Process |
Program |
Lines |
|
Activity A1 |
P1 P2 |
750-900 1200-1600 |
|
Activity A2 |
P3 |
680-830 |
|
|
|
|
Even if two separate teams harvest the processes and rules, it is now possible to link the two by programmatically joining the two tables. One may notice, for example, that the business rule BR1 is used in activity A1, because the code implementation of this business rule is included in the code implementation for the activity. In reality, such connections may be more complex, but a more complete model may be inferred, as in figure

Harvesting business rules and processes from a legacy application may require various skills and as such may involve multiple people or teams. Assuming that rule harvesting and process harvesting software is available, a project may be organized on the following lines:
Such an approach would result not only in a good functional specification of the application, but also in a practical way of organizing a modernization project.