Jaqui Lynch - MVS and UNIX, Are they really that different

Presented at GBCMG in Boston, September 1994

                  MVS and UNIX, are they really that different?
                             by Jaqui Lynch
                             Boston College

One of the major issues faced by systems personnel as they migrate into the client/server world is that of providing the same level of performance and capacity information to management that was provided in the MVS world. This is complicated by the need to be able to determine how and where resources should be allocated - what should run on the mainframe, what should be distributed, and how to build some kind of correlation between the mainframe and workstations to assist with such decisions. This paper looks at some of the factors that need to be taken into account before these decisions can be taken, and at some of the similarities (and differences) between UNIX and MVS in the capacity planning and performance arenas.

With CPU requirements increasing at a time when budgets are decreasing, and with the need to develop applications more rapidly, it has become apparent to a large number of mainframe sites that it will be necessary to combine, or even replace, the legacy mainframe systems with client/server or workstation based systems. One of the major issues involved here, from a capacity planning standpoint, is how to provide the same level of reporting and capacity planning that management have become accustomed to in the MVS world, and how to leverage the MVS knowledge that capacity planners and performance analysts have today into the UNIX world.

One of the first obstacles is gaining an understanding of the terminology used and learning a whole new set of acronyms, some of which look the same but mean different things. Once this hurdle is overcome it is merely an exercise in ensuring that the differences between the two systems are clearly understood, so that currently implemented methodologies can be transferred into the new world.

So, are MVS and UNIX really that different? The answer, of course, is that "It Depends". In the capacity and performance worlds the metrics and methodologies are still the same. The tuning process consists of the following:

1. Identify the workload and measure it.
2. Set objectives.
3. Make changes.
4. Measure the effect of the changes.
5. Interpret and understand the results.

Many decisions are also based on the same criteria that they always have been:

1. Upgrade the CPU when it gets too busy.
2. Share the data across multiple controllers.
3. Buy more disk when you run out.

All that has really changed in many of these areas are the rules-of-thumb and, of course, there are now a few more options such as: why upgrade the processor - why not just distribute the processing to a new system?

One of the major changes when moving into the UNIX environment is that it is important to consider items that didn't matter before such as: how to distribute applications and what platform to distribute them onto. This could be based on CPU requirements, whether packages are available that perform the functions required, I/O rates, memory requirements, throughput and other resource utilizations. In order to be able to make these decisions it is important to know what the cutoff points are on the new platform - i.e. it is important to develop new and rational ROTs (rules of thumb).

A good first step is to develop a spreadsheet for the possible systems containing all quoted SPECmarks, etc. for those systems. A ratio from one system to the next can be useful for comparing the various systems to each other. Of particular concern for me was that systems were being sized for both administrative and faculty computing, and the requirements for the two are almost diametrically opposed - for a system to run simulations the primary concern is CPU power, reliability and memory as the user typically intends to run CPU intensive simulations which write to disk infrequently. Such a system should have sufficient memory so that the application does not page. Such systems are rarely multi-user and the user wants the simulation to be given whatever it wants immediately. Administrative systems are very similar to the MVS world - paging is a concern if it hurts the system, but some paging is acceptable. These systems are multi-user, with both batch and online workloads, and CPU power is not as critical as disk throughput. Controller and LAN throughput, however, become critical metrics for these systems.

With planners now having to work at the LAN level, the LAN becomes very important in performance tuning and capacity planning, particularly with respect to timing backups, archivals and data movement around peak load times. If the LAN itself doesn't have sufficient throughput capacity, then this can also disguise bottlenecks in the systems themselves, as they will not be running at the true demand levels. Latent demand exists just as surely in the client/server world as it did in the MVS world.

Because the LAN is now involved to a much greater extent, not only does LAN speed and LAN adapter throughput become important, but there are many additional issues. It is essential to backup data that is spread across multiple systems and to manage those systems so that the data is timely and accurate at all times. Data integrity in the client/server world becomes a performance issue as multiphase commits and rollbacks add an additional overhead.

There are also other data related issues such as transparent access to data across a range of platforms and technologies. Technology plays a key role in the tuning decisions - while an MVS system may have two controllers with 60GB of disk behind each one, it would be very unusual to put that amount of disk behind one SCSI controller. The first SCSI controller has not only disk behind it, but also has CD- Rom, the tape drive and possibly other equipment connected to the same controller - a phenomenon unheard of in the MVS world. The SCSI controller also does not have the amount of cache or the specialized features such as quad-porting or channel reconnect that the 3990-style controllers have. In general, the SCSI controller transfers all of its data on a single bus, which can cause major contention problems. Thus, despite the fact that the CPUs are cheaper, balancing I/O across controllers can become critical, if high I/O rates are to be maintained.

Workload management is another interesting challenge. It is important to know whether the workload is a client, a workstation or a server so that it can be set up correctly, and whether network access will be NFS, AFS, TCP, FTP or a NetWare-style protocol. There are also various types of server, such as mail servers, application code servers, boot servers for X-stations, data servers, database servers and so on. Each one can have totally different performance needs.

There are three additional workload categorizations that describe how any workload runs - batch, background and interactive. Essentially, they are very similar - interactive locks out the screen or window till the job is finished; background requires that the user stay logged on while the job runs, but the user can do other things during that time; and batch is basically background, but the user can logout and the results are emailed back to that userid.

Another difference is in how CPU is rated across systems. Mainframes are quoted as being a certain number of MIPS or as having a particular ITR for a certain workload. Workstations are measured in Dhrystones, Linpack MFLOPS, TPC-A, TPC-B or SPECmarks, and there are several variations on SPECmarks: SPECmark89, SPECint89, SPECfp89, SPECint92 and SPECfp92. At this point there is no coherent correlation between SPECmarks and MIPS, which makes it difficult to take a mainframe workload and correctly size a workstation for it. This is going to cause a major problem as products such as CICS become more widely established on the workstation platform. It is becoming critical that some cross- and inter-platform benchmark data be produced to simplify this task.

There are also some additional issues to look at in the workstation world that are either nonexistent or of minimal importance in the MVS world. Items such as graphics adapters, pixels, number of slots, tape size and speed and optical disk can become critical components from both a performance and a configuration standpoint.

Whatever the differences, however, there are still many similarities between MVS and UNIX. It is still important to tune systems for the workload being run, to prioritize the workloads, and to try not to mix workloads on the same system wherever possible. Disk and controller throughput are still vital and balancing I/O is still a consideration. It is still important to back up the data, and with multiple systems it may make better sense to use a centralized backup site to avoid problems with management, staffing and storage of additional media. This however, adds additional pressure to the LAN. The critical metrics are still CPU, memory, I/O and controller throughput, and it is only necessary to modify ROTs to apply to them. The major component that needs to be added is LAN throughput.

There are now many tools available in the UNIX world to measure and report on the necessary metrics. Although mostly they trust the values stored in /dev/kmem, at least it is a reasonable place to start. The only part that is really missing is the ability to take workload information from MVS and model it on UNIX to ascertain the best configuration and the best platform. Within the next two years, or so, this should no longer be the case.

Lastly, it is important to remember that the workstation is neither a cheap mainframe nor a large PC, and that to perform well it requires not only systems programmers (systems administrators in UNIX), but also performance specialists and capacity planners with a good understanding of networked systems.

Index of Jaqui's Papers

Pete's Wicked Home Page

© 1995 Jaqueline A. Lynch

Compiled and edited by Jaqui Lynch

Last revised June 5, 1995