Automation vs. Quality in Legacy Modernization

 

Michael Oara

CEO, System Renewal, Inc.

 

 

 

Abstract

 

As the applications of the ‘70s and ‘80s are becoming obsolete both from a business and a technological point of view, more and more IT shops start queuing application modernization projects. Such projects are complex, costly and risky and some degree of automation in the transformation process appears as very attractive. Automated transformation has its advantages and disadvantages. In general, a tension exists between the degree of automation and the quality of results. This article explores the opportunities as well as the issues of automated transformation, suggesting a number of solutions that could be combined in a coherent strategy. 

 

 

 

As the applications of the ‘70s and ‘80s are becoming obsolete both from a business and a technological point of view, more and more IT shops start queuing application modernization projects. Such projects are complex, costly and risky and some degree of automation in the transformation process appears as very attractive. Looking only at the costs, one may estimate that rewriting the application from scratch in a new environment, without any automation, would roughly entail at least the same effort and the same costs as the original. Not only are the costs prohibitive, the extended development time may render the new incarnation of the application obsolete even before it is used in production. Automation appears as one way to solve these difficulties.

 

Where can automated transformation play a role? In an ideal scenario, automation would be like a black box for which the legacy application written in the 70s is the input and the new, modern application is the output. Even such an extreme scenario is not ideal, as in most cases there is also an effort to change not only the technological platform, but some of the functionality, adding new features and making it more flexible. Thus, from the very beginning we may settle for less than 100 percent automation of the transformation process, although we may keep the 100 percent as a measuring stick.

 

The role of preliminary steps

 

Various automation methods may be employed, with varying degrees of success.  As a common denominator, all require the preliminary step of analysis, which not only helps decide the best strategies, but gathers information needed for transformation. An application clean-up is also useful, as it simplifies the application and eliminates the dead wood that does not have to be carried over to the new platform.

 

Analysis

 

Analysis may be a vague concept, but in the context of automated legacy transformation it must take more precise forms. One crucial factor is that the results of analysis must be captured in a form accessible to transformation software. If all the information is simply accumulated in a series of free text documents and nice pictures, there is little chance of using it as an input to the transformation process. Ideally, the results of analysis should be accumulated in a well-structured repository, where they can be accessed programmatically.

 

A good analysis may reveal general information used to estimate the magnitude of the transformation project and to decide the best means to perform it. This includes:

 

·         The size of the application and its individual components

·         The complexity of the programs

·         A classification of the artifacts along various categories

·         The interaction between the user interface, the programs and the data

 

More detailed information could be used for the actual automation of the transformation process. Such detailed information may include:

 

·         Data model, layouts and data structures used in programs

·         Screen contents and layouts

·         Program syntax trees

·         Screen flows

·         Call maps

 

Having gathered all this data, the team in charge of transforming the application may choose the best transformation strategy. In addition, all the knowledge captured in a repository could be used in the next major step, the actual conversion of code and data.

 

The good news is that a great deal of analysis may be automated. There are software vendors with tools capable of taking in the sources of the application and collecting all of the data listed above, in an automated fashion. Such tools differ in their abilities to capture, store and expose the information, so a legacy transformation team must choose the one best suited for the following steps.

 

Clean-up

 

A legacy application that grows organically for many years inevitably accumulates artifacts that are totally superfluous. Such artifacts may include programs, jobs or files no longer used or even dead code in programs otherwise utilized in the application. Most of the clean-up operation may be fully automated, thus saving time and resources in transformation projects.

 

In an actual project in which this author participated, hundreds of reports in a banking application were classified as obsolete. Having a list of obsolete reports, an automatic method was devised which removed all programs generating them and many other artifacts, like JCL jobs, sort cards, or intermediate files. The much reduced set of remaining report producing programs were converted from Cobol to Crystal reports.

 

Restructuring

 

The source application may be further improved to ease the transformation effort by applying some preliminary architectural improvements. Such improvements may include modularization and standardization of data names and programming style. These activities are subjects to large degrees of automation and improve the efficiency of  the actual transformation steps.

 

Major automation strategies

 

One may of course employ a completely manual transformation strategy. This may be the

We may classify automatic transformation strategies in three broad categories.

 

·         Code-Code

In code transformation, a software tool uses existing code to generate new code. Programs written in a particular Cobol dialect may be automatically converted to another Cobol dialect or even to modern languages such as Java. CICS or IMS screens may be automatically converted to HTML or other modern UI formats. VSAM access methods may be automatically converted to SQL.

 

·         Code-4GL-Code

In this approach the application code is initially used to generate 4GL code, which hides some of the technicalities of the original programs and raises them at a somehow higher level of abstraction. The 4GL code is then used to generate actual compilable code in various languages, specific to the environments in which the application will run.

 

·         Code-Model-Code

In model transformation, a model is extracted from the legacy application and used to generate new code in a modern environment.

 

As one moves from manual transformation to the “code to code” method or beyond, more and more details of the original code are omitted in favor of a clear and concise model. The code quality of the final results increases, but more manual work is required in the end in order to fill in the details lost in the raising the level of abstraction. 

 

Code transformation

 

Because of its inherent difficulties, little automated code transformation is used today. Both software vendors and internal IT teams have attempted to automate at least to some degree. Such attempts run into a series of problems springing from the large gap between old and new. Each gap creates its own challenges and together they make the task even more difficult. Let’s look at some of them.

 

Language gap

 

The 3GL and 4GL languages of the 70s and 80s were procedural and program-centric. This means that the programmer created a number of procedures to govern the functionality. A program may have dealt with all the aspects of this functionality, starting with user interaction and ending with the access to data. In modern environments, object-oriented languages are preferred. The programmer creates a number of specialized procedures which react to events or messages flowing through the system. It is almost impossible to transform old code to new code.

 

Batch/online gap

 

A lot of legacy applications are running large components in batch mode, usually in a night cycle. The trend is to go more to real time transactions, where data is updated or retrieved almost instantly. Batch programs tend to be large and complex, dealing with multiple transaction type and with large amounts of data. Online programs tend to be smaller and specialized in a single type of transaction. Online systems have client and server components, which are usually separated. The transformation from batch to online requires a new design which could rarely be generated automatically from the existing application

 

Supporting technology gap

 

There is sometimes no mapping between the artifacts of a legacy application and those of a modern technology. While a legacy application in a CICS environment operates with programs, CICS transactions, COMMAREAS, temporary and transient queues, modern systems use quite different concepts, like classes, EJBs, application servers, etc. While some parallels may be drawn, some legacy artifacts cannot be mapped into new technology.

 

Database technology gap

 

While data is easier to reengineer then processes, some difficulties still persist. A modern relational model is capable of expressing most of other types of data organization, but it is much more difficult to match data manipulation statements. Somehow both data models and data access may be reengineered, but in the case of the latter, some loss of efficiency and quality happens during an automatic transformation.

 

The loss of quality

 

On each one of the gaps listed above, some serious progress may be made in providing automatic transformation, but a major concern remains the quality of the results. We use here the most generic sense of the concept of quality; however, some aspects are worthy of mentioning.

 

Proper style – by this we mean that the code resulting from the automated transformation should conform to the style expected by the average programmer in the target environment. It is possible, for instance, to generate Java code which looks entirely like Cobol, but this is not satisfactory for the Java programmer who is stuck with maintaining the new implementation.

 

Maintainability – the resulting code should be clear and concise. The “garbage in – garbage out” principle still stands, but if the creators of the original application took pains to create clear, concise, easy to understand and maintain code, the expectation is that the new application conforms to the same standards. In fact, modern languages have even more potential for clarity and maintainability and the stakes are higher.

 

Performance – must be tuned to the new environment. The requirements on performance may be quite different between the old and new implementations. This is very clear, for example, when a batch application is transformed into an online application in which the response time is of a crucial importance.

 

A high degree of automation resulting in a high quality implementation is the holy grail of legacy transformation. Unfortunately, the two objectives – automation and quality – are most often contradictory. The following graph illustrates the relationship between the two:

 

 

 

The two curves shows the decrease in quality for two cases: when the initial quality of the source application is high(the upper curve) and when is low (the lower curve). Playing with the two factors, quality and automation, a transformation team may set two thresholds: one for quality – never to decrease under a certain level – and one for automation – a certain degree of automation is needed, otherwise the team would just revert to manual transformation. The picture illustrates the fact that automation is possible and makes sense only on a restricted portion of the curve. The real art is to determine that part, to figure out what type and degree of automation would render the best results while preserving a satisfactory level of quality.

 

We can express the relationship between various factors involved in a code transformation project in a simple formula (although in reality it may be more complex):

 

Equation 1:  Qfinal = K*

Where

 

Qinit is the initial quality of the source code,

Qfinal is the final quality of the target code after transformation,

A is the degree of automation,

D is the size of the domain metamodel, i.e. the number of entity types that appear in the model (for instance programs, tables, screens, etc.),

K is a constant.

 

A better way to express this equation is

 

Equation 2: Qfinal = average(Ki*Qi/Ai*Di) over all Di

 

where Di is a partition of the metamodel.

 

This simply means that the original domain of entities is split into subdomains and the automation is applied to each of them. For instance, screens, programs and tables would be transformed separately and independently from each other.

 

As is usually the case in engineering, the problem of delivering both automation and quality may be approached by breaking it into parts and devising strategies which lead to satisfactory solutions.  This is equivalent to creating partitions in equation 2, such that automation may be properly applied on each of them. Such partitions may be created not only for objects that are totally different in nature (like programs and tables), but also for classes of objects of the same type. For instance, the total set of programs may be divided into client programs and server programs, and different automation methods may be applied for each. Furthermore, server type programs may be divided into “data retrieval” and “data update,” then “data retrieval” may be divided into “single row retrieval” and “multiple rows retrieval,” and so forth. The automation process would then be adapted to the specifics of each, while totally abandoned for some of them.

 

Practically, the code transformation strategies my appear as bellow.

 

Adaptation to specifics

 

As there is no “magic bullet” software ready to  transform any given legacy application into a modern one, it is both necessary and practical to adapt to the specifics of the source and target applications. Such an adaptation does not address just the generic source and target language and environments, it also takes advantage of any particular standards and styles found in the source application. A software engineer may notice, for example, that the source application follows certain naming standards which give clues about the usage of data. Naming standards may also differentiate between small static tables (in which the content does not change frequently) and large dynamic tables, thus creating different methods to access them.

 

Iterative transformation cycles

 

Adaptation to specifics cannot be achieved instantly, but only through multiple attempts at automatic transformation, each transformation being followed by a careful study of the results. The transformation is broken into multiple cycles, each one resulting in a higher level of quality.

 

Each cycle would repeat the following four steps:

 

Although steps 1, 2 and 4 are time consuming, step 3 may deliver a degree of automation that will by far compensate for the time and effort required by the other steps. The only question is how many times the cycle is repeated.

 

Here is a real time example in which a very large banking application was transformed from COBOL/CICS to JAVA.

 

There were two teams -- one that adjusted the automatic transformation software, formed by two developers -- and another, formed by two programmer-analysts who looked at the source and target application suggesting changes in the transformation software. Each cycle took about two months and there were about five cycles. This resulted in about 20 person-months, as opposed to about 1000 person-months that would have been required for manual transformation.

 

Partial transformation steps

 

In some cases it is recognized that a nearly 100 percent automation is either technically impossible or not economically feasible; however, a partial automation is still attractive. As we have already mentioned, data model transformation is always easier. Data manipulation methods are easy or difficult to translate, depending on the differences in databases (VSAM to relational is relatively easy, Codasyl to relational is more difficult.) Language translation provided by select vendors works fine for purely algorithmic code, which does not involve data access or user interface interaction.

 

Model-based transformation

 

As explained above, code transformation has some inherent difficulties that are multiplied when the gap between technologies is larger. Model-based transformation is another approach, in which some of these difficulties are overcome by abstracting out the technical details of the legacy implementation and focusing on the essential functionality of the application.

 

In model-based transformation, an analyst uses a specialized tool to extract a model that describes the current functionality of the application. This is neither a purely automatic, nor a purely manual effort, but what we may call a “tool assisted” effort.

 

The advantage of the extraction tool is that it allows the analyst to use the legacy application as raw material, select the aspects to be extracted and automatically convert them to model artifacts. This is quite different from building the model from interviews, manually drawn diagrams or code listings. At the same time, the advantage of the “tool assisted” model is that the analyst digests the information and ignores the incidental details, retaining only those aspects which are of high interest. The resulting model has the human touch, which makes it comprehensible and useful for the next phases.

 

Once a model is extracted, it may be used immediately to generate artifacts in a modern technology-based application, or it may be further refined by adding new functionality or implementation details reflecting the target architecture.

 

The last step of the process is the generation of the new implementation artifacts. Going from model to code is a classic exercise, supported by multiple software tools. Not only is this step automated to a high degree, it also delivers excellent quality, as the generated code tends to be well organized, predictable and standardized.

 

In the model based transformation approach the quality issue is non-existent or minor, for two reasons: the incidental technical details of the source application are already abstracted out while the generation from a well designed model usually results in clean and efficient code.

 

A comparison between code transformation and model-based transformation is summarized in the table below.


 

 

 

Code transformation

Model based transformation

Capacity to capture details

High

Low

Capacity to quickly generate individual artifacts, not integrated in a consistent architecture

High

Low

Capacity to generate a consistent, integrated architecture

Low

High

Quality of generated artifacts

Low to average

High

Flexibility (as both the source and the target applications change)

Low

High

 

 

 

 

Intermediate models

 

Which are the types of models that capture the legacy application functionality for the purpose of creating a new modern implementation? There are perhaps many types of models, supported by various tools, but we would like to point to two of them that have a wide-spread acceptance and usage and are supported by existing standards.

 

UML – In this case the analyst may extract information in the form of use cases, activity and sequence diagrams or class diagrams, etc. Most commercially available software tools that support UML also offer a forward engineering capacity that allows generation of high quality code.

 

Business Rules – Specialized software tools are able to help locate the business rules of an application. These rules govern the minute computations, decisions or validations in an application and express the specifics of how the business decides to run certain aspects of its operations.

 

Data models – In most cases, data models can be extracted from data definitions existing in the application. If the source application and the new application are based on the same type of database technology, little model changes are needed; however, in other cases some model derivation may be necessary.  One such example is the transition from a hierarchical to a relational model.

 

We may observe that with these three types of models, it is theoretically possible to express most functional aspects of an application, at the same time leaving aside the incidental technical implementation details. This presents a great advantage, resulting in a clean design and the freedom to choose the best technical implementation of the transformed application, in view of the organization’s existing software, skills and preferences. Another advantage is the fact that the model extraction and refinement may be executed at any point in time, before the new implementation is built. The IT organization would immediately benefit from a clear documentation of the existing application, and the actual reimplementation work may start when the time is right and all major decisions regarding architecture are made.

 

Conclusions

 

Legacy transformation projects may benefit greatly from automation, as long as both the possibilities and the obstacles are well-known in advance. There is no unique technology or methodology to address all issues at once, but there is a large spectrum of partial solutions. Choosing the right portfolio of such solutions is the key to a successful transformation project executed in time and with high quality results.

 

Being fully aware of both the benefits and the pitfalls of various automation methods, a team charged with executing an application transformation may choose a portfolio of such solutions, which best fits its goals and constraints. For example, data transformation could proceed with a common conversion solution, UI transformation may proceed on a code transformation path, and procedural code may be used in a model-based transformation.

 

As transformation projects are relatively new for many IT organizations, there is little in-house experience in selecting and executing the best strategy. There are, however, software companies in the legacy transformation space as well as system integrators that together may offer both technology and experience perfected through many previous projects.

 

At this point, transformation projects require both art and technology. It is conceivable that in the long term the art component would be reduced in favor of the technology component. What is certain is that as legacy applications age, such transformation projects will become more and more frequent and organizations will have to prepare for them.