Configuring a DCE Cell for Performance Jaqui Lynch Boston College
One of the major issues faced by Capacity Planners, as they move into the Distributed Computing World, is how to configure the new systems for the best performance. This paper attempts to address that question by introducing the components of DCE, and then investigating what is required to set up the DCE environment so that performance is ensured. In particular, issues relating to configuration planning are covered.
Since 1992 Boston College has been exploring the Distributed Computing world, and in late 1993 finally decided to move into that world by implementing a pilot project to test out the performance and functionality of the Distributed Computing Environment (DCE).
It was decided that the pilot should consist of setting up a DCE cell consisting of UNIX machines and personal computers running DOS and OS/2. The applications to be modelled would be chosen at a later date, once the dynamics of the environment were better understood.
Some of the issues in setting up a cell include UNIX generics, such as speed of SCSI disks and buses, etc. It is not the intent of this paper to address these, but rather to address those that are DCE specific. In order to understand what is involved in setting up a DCE cell, it is important to understand the components necessary to create the cell.
A DCE cell consists of several components: Security services, Cell Directory (CDS) services, Time services, Threads, Application servers, and Application clients. Global Directory services are used for inter and extra cell communication. A cell can be thought of as an administrative domain containing at least one of each of the following - Security server, CDS server, Threads and a Time server. Although all of the above can be configured onto one machine, keeping the cell very small, this is not the recommended solution for a large cell with many clients.
Security services are responsible for maintaining the registry of principals, authentication into the cell, and authorization for access to services, including applications. They are also responsible for encrypting data, and ensuring that passwords never travel in clear text. The security system is based on Kerberos V5 from MIT, and security is accomplished by passing encryption keys and tickets between clients, servers and the security server. Access control is done using a combination of ACLs and the UNIX permissions. It is important to clearly understand the level of security required for each application as this can have a major impact on performance. The options are typically no security or Kerberos style security. If full encryption is chosen (with Kerberos style security) then there can be a significant performance hit to implement that. It may be better to use the cryptographic checksums, except for very sensitive data. The ticket expiration time for credentials currently defaults to 168 hours. This is a value that should be changed to a much smaller value. Making this too small can cause network congestion due to users having to reestablish their credentials too frequently. Once the feature is available, it will assist performance to have a couple of readonly replicas of the security server on the network. This will reduce congestion when authenticating and authorizing access. Most vendors today are offering DCE 1.0.2 and full security replication is part of DCE 1.1 which is probably a year away.
The prime function of Cell Directory services is to keep track of where principals are in the cell. The directory has an entry for every principal, such as an application server or a service, and its associated location. Whenever a server comes up, it registers its status and location with the CDS, and when it terminates the server should also notify the CDS.
Distributed Time Services is a way of ensuring that all of the clocks in the network stay in sync. Because of clock drift it is possible for the clocks to get out of sync, which causes problems in following audit trails, and in the security arena. The key generation is based partially on time, and authentication will fail if there is more than a 5 minute (default) time difference between machines.
The last required component is Threads, which can be compared to TCBs in the MVS world. Threads allow the server program to have multiple clients executing under its control. Some operating systems are automatically threaded, but others are not and will require the addition of threads for the DCE world. Threads allow parallelism within and across systems and allow the programmers to code overlap into their programs.
Good threads programming requires the use of mutex locks (similar to enq/deq) on shared variables and has many of the same issues that come up with the use of enq/deq - specifically the fact that good programming discipline is required and locks should always be obtained in the same order. Performance can be affected by lock conflicts, locks being held longer than necessary or deadly embraces. Finally the main program should have a failsafe way of knowing when the last thread has finished (if main waits on that thread) or that the thread has abended, otherwise that program may go into an endless wait on the thread.
Moving to the DCE world involves dealing with many issues that can affect performance. Designing the cell involves resolving many of these issues at the beginning and also requires an indepth knowledge of the network topology, as this affects how the cell is set up. An example of this occurs in the areas of CDS and DTS advertising. These servers advertise their presence by broadcasting information. Broadcasts cannot traverse routers so it is important to ensure that additional servers are put in place should it become necessary to go across routers within the cell. This may involve setting up additional readonly CDS systems and courier DTS systems. The use of readonly (there is no write replication) CDS replicas is highly recommended, as everything goes through the CDS and this is where most bottlenecks occur.
The CDS should also be cleaned up regularly. If a server fails it may leave stale entries in the CDS. This can cause lookups to fail and will cause lookups to take longer, as there is more data to search through. Much of this can be avoided by having a master server that starts other servers and by coding cleanup routines in the master for when the servers it controls terminate. Performance in the CDS can also be enhanced by searching on a combination of the name and the object UUID - this ensures that the server is found faster and is also the correct server (server names are not necessarily unique - UUIDs are).
When designing the DCE environment, the major performance issues include how many cells to have, what to put in a cell and how many entries to have per cell. Since every user has to contact the security server for its cell, it would make sense to ensure that a cell does not span countries or major geographical areas (i.e. LA and Dallas). The decisions made at this point have an ongoing impact on how server replication is done and also on network traffic.
Since every member of the cell has to contact both the security server and the CDS server before access is granted, then it is recommended that those two services be on the same system. This also helps to avoid timing problems and reduces network traffic between the two.
With the upcoming release of DCE for MVS, the role of the mainframe is changing and it will be possible to integrate the mainframe into the distributed world. One of the key performance changes to be made to the cell would be to channel attach the Security/CDS server to the mainframe to provide faster access for authentication and application access.
One of the key issues is network traffic, and the impact on the network of adding a DCE cell. Because of the amount of setup that takes place, there are approximately 280 packets of data just for the DCE login. As more security and encryption is added, this overhead is even higher.
Another item that affects network traffic is the choice of graphics frontend, and how that is implemented. For instance, if screenscraping is used to get data into the GUI, then 3270 screens of data are being sent to the desktop and the attribute bits, etc are being removed at that point. It is far more efficient to just send the required raw data as this reduces network traffic.
Memory sizing is important in the DCE world, as many of the tables need to be kept in memory, and memory caching is used by the CDS servers. For a server a minimum of 32mb is required, but a pilot is necessary to really be sure that the memory sizes chosen are correct. All of the servers at BC have between 64 and 256mb of memory.
In the DCE world, poor application design can have a far-reaching effect on performance. During the design and pilot phases, application should undergo rigorous profiling and performance testing. One of the major problems is scalability - the application may work fine in a small test cell, but could perform very badly in a larger environment, so load or stress testing should also be considered.
It is also important to consider replication of servers (apart from Security and CDS). If high availability is a necessity, then it may be necessary to replicate servers, to have hot standbys, or to implement high availability options for those servers. Replication allows a machine to be taken down for maintenance, and also provides a performance boost by allowing a client to choose from several servers, which can be prioritized so that the workload is shared. If such a scenario is implemented, then some changes are recommended for the import of binding handles. Binding handles can be imported in two ways - individually or all of the handles for that type of server (called a vector).
If the client is going to be doing selections from multiple servers then it should request the vector so that it doesn't have to keep going back to the CDS to get the next server binding handle, until it finds a server it likes. By combining this with the use of grouping or profiling servers, it is possible to provide failsafe prioritized access to servers.
With respect to the movement of data, there are two main ways to move data. For small finite amounts of data it is usually recommended that RPC marshalling and unmarshalling be used. This involves embedding the data in the RPC and passing it across the network. However, if large amounts of data (> 1mb) are going to be transferred, or the data is of an indeterminate size (SQL call with potential for a lot of data), then it is worth investigating the use of Pipes. A Pipe is basically a datamover. It is important to be careful in choosing the chunk size for Pipes of data - an overly large size can mean moving lots of half-filled chunks, and a small size can mean moving multiple chunks more often than might be necessary.
There are many other areas that can affect performance in a DCE cell. Two of the key remaining areas are the network itself and the Distributed File System. It is not the purpose of this paper to discuss the network area, apart from to say that network design has a major impact on how a DCE cell will be designed and network congestion has a major impact on the performance of that cell. It is also important to have a strong network administration and design group which has the authority to enforce and retrofit standards that may include network addresses, naming standards, UNIX uids and gids, and other related standards.
In the case of DFS, most of the issues are with respect to placement of data and how that data is used by clients and servers. Good programming discipline needs to be in place for access to data, particularly in regard to the use of mutexes (locks) to protect shared variables when threads are being used.
As can be seen from the above commentary, there are many things to consider in designing a DCE cell for performance. The key issues for success are definitely network performance and design, how security is implemented, and the use of replication for servers and services. A great deal of the responsibility for the performance has been moved back to the programmer, who has to be concerned about design decisions for the programs, and the effect they will have when there are hundreds of clients accessing the servers, as opposed to the initial few used in the pilot. However, with a valid pilot, and a good deal of forethought, it is possible to implement a DCE cell and still provide mainframe (or better) performance.
Other Reading:
OSF DCE Guide to Developing Distributed Applications
by Harold W. Lockhart
Understanding DCE
by Ward Rosenberry, David Kenney & Gerry Fisher
Guide to writing DCE Applications
by John Shirley
OSF DCE User's Guide and Reference
by Open Software Foundation
Compiled and edited by Jaqui Lynch
Last revised June 5, 1995