Thursday, December 12, 2013

Improving mainframe performance with runtime and compiler optimization

Cutting CPU consumption on the mainframe is a full-time job. Every CPU cycle saved not only defers upgrades but also reduces usage-based software charges.
IBM Language Environment (LE) compilers and runtime offer many tuning opportunities -- without having to change the source code.

Compile time options

Through LE compiler optimization, mainframe programmers can tailor the object code to the strengths of a processor family.
ARCH (Architecture) is one compiler option. The ARCH level directs the compiler to generate object code with performance-enhanced machine instructions available on the target processor. ARCH is increasingly relevant, as IBM has introduced several generations of machines with instructions designed to boost performance.
Another option, TUNE, tells the compiler to arrange the machine code in an order that will take advantage of the processor's instruction pipeline and cache.
A programmer must set these options for the oldest processor family in production. Pick the wrong ARCH option, and operation exceptions (0C1s) could result; a bad TUNE level could hinder performance.
Although these options were originally exclusive to the C++ compiler, IBM has extended them to COBOL and PL/1 high-level languages as well.

Runtime options

IBM LE also offers several runtime options that can improve mainframe performance. Here is the hierarchy for specifying them:
  • Runtime options specified at program invocation
  • Options linked into the program with a user options (UOPT) control section (CSECT)
  • Region options (ROPT) module
  • Global options set in the CEEPRMxx PARMLIB member
CBLPSHPOPS' routine-handling conditions, storage initialization and stack and heap sizes are runtime optimization options, particularly in a CICS environment.
CBLPSHPOPS. CBPSHPOPS controls whether LE issues PUSH HANDLE and POP HANDLE CICS commands when entering or exiting a COBOL routine. The PUSH HANDLE command stacks all outstanding handle conditions, while POP restores the handle conditions from the previous push. If any conditions are raised for which there are outstanding handles, control will shift to the error routine specified in the HANDLE command.
Turning off the CBPSHPOPS option will save CPU cycles by avoiding the extra PUSH and POP commands. However, without CBLPSHPOPS, a condition raised in a lower module could percolate up to a higher-level handle routine that was not prepared for the error. Only change the setting after analysis and testing.
Storage initialization. The STORAGE option controls memory initialization, with parameters for newly acquired heap segments, for setting heap segments when LE frees them and for initializing stack or automatic storage when control enters a routine.
Heap initialization tends to be the cheapest in terms of CPU cycles. Initializing stack storage is more expensive, although the cost depends on the number of subroutine calls. Avoid the erase-on-free option unless you work for the National Security Administration.
To avoid LE memory initialization altogether, follow best programming practices that assume storage is uninitialized unless otherwise explicitly set within the program.
Stack and heap sizes. LE has its own memory manager, which aims to reduce the number of times a program has to go to the operating system or CICS for more storage. LE gets memory in big blocks that it subdivides as needed. When the block can't satisfy a storage request, LE calls the OS or CICS for another block. A wise choice of initial heap or stack storage will save calls to OS memory management and reduce CPU usage.
Picking the initial block size is more of an art than a science. Small blocks tend to drive CPU use, but larger blocks could degrade storage use.
Fragmentation inside a large storage block is especially troublesome in mixed applications. For a hypothetical application, one program gets heap storage in 512 KB chunks, while another in the same enclave tends to get heap 32 bytes at a time from a heap block of 1 MB. As the application runs, the first program gets 512 KB and calls the other that gets 32 bytes. When the first program tries to get 512 KB, LE can't fulfill the request within the current block and must get another block. This leaves almost 0.5 MB of storage unused.
Study your applications' behaviors before picking an initial heap and stack size. Also make allowances in CICS for the 8-byte "crumple zone" at the beginning and end of every user storage segment. A 4-K IBM LE transaction storage request (GETMAIN) actually occupies 4,112 bytes, which leads to CICS storage fragmentation. For CICS, use 4,080 bytes, which fits nicely into a 4-K page. LE also uses some of the fresh storage for its own control blocks, which further reduces the space available to the application.

0 comments:

Post a Comment