java

JCO 2011 한국 개발자 컨퍼런스

posted May 18, 2011, 10:47 PM by Kuwon Kang   [ updated May 18, 2011, 10:52 PM ]

2011 JCO

posted May 15, 2011, 4:06 PM by Kuwon Kang   [ updated Jun 12, 2011, 10:19 PM ]


제 인생의 첫 공식 데뷔무대입니다^^

2011 JCO에서 발표하게 되었습니다.

2011 주제는 Cloud인데 주제를 넘 늦게 알게 되었습니다.
미리 알았더라면 Data Grid, Clustering등과 같은 좀더 밀접한 주제를 할 수 있었을 텐데 좀 아쉽긴 하지만.

제가 발표하게될 주제는 아래와 같습니다.

Application hot-deployment for Enterprise Java Architecture without vendor specialized technology

트랙1 입니다.

발표 자료 보기


Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine

posted May 3, 2011, 4:13 PM by Kuwon Kang

Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine


 
 

Java

Tuning Garbage Collection 
with the 5.0 Java TM Virtual Machine

See alsoPerformance Docs


Table of Contents

1 Introduction

2 Ergonomics

3 Generations

3.1 Performance Considerations

3.2 Measurement

4 Sizing the Generations

4.1 Total Heap

4.2 The Young Generation

4.2.1 Young Generation Guarantee

5 Types of Collectors

5.1 When to Use the Throughput Collector

5.2 The Throughput Collector

5.2.1 Generations in the throughput collector

5.2.2 Ergonomics in the throughput collector

5.2.2.1 Priority of goals

5.2.2.2 Adjusting Generation Sizes

5.2.2.3 Heap Size

5.2.3 Out-of-Memory Exceptions

5.2.4 Measurements with the Throughput Collector

5.3 When to Use the Concurrent Low Pause Collector

5.4 The Concurrent Low Pause Collector

5.4.1 Overhead of Concurrency

5.4.2 Young Generation Guarantee

5.4.3 Full Collections

5.4.4 Floating Garbage

5.4.5 Pauses

5.4.6 Concurrent Phases

5.4.7 Scheduling a collection

5.4.8 Scheduling pauses

5.4.9 Incremental mode 

    5.4.9.1 Command line 

    5.4.9.2 Recommended Options for i-cms 

    5.4.9.3 Basic Troubleshooting 

5.4.10 Measurements with the Concurrent Collector

6 Other Considerations

7 Conclusion

8 Other Documentation

8.1 Example of Output

8.2 Frequently Asked Questions


  1. Introduction

The Java TM 2 Platform Standard Edition (J2SE TM platform) is used for a wide variety of applications from small applets on desktops to web services on large servers. In the J2SE platform version 1.4.2 there were four garbage collectors from which to choose but without an explicit choice by the user the serial garbage collector was always chosen. In version 5.0 the choice of the collector is based on the class of the machine on which the application is started.

This “smarter choice” of the garbage collector is generally better but is not always the best. For the user who wants to make their own choice of garbage collectors, this document will provide information on which to base that choice. This will first include the general features of the garbage collections and tuning options to take the best advantage of those features. The examples are given in the context of the serial, stop-the-world collector. Then specific features of the other collectors will be discussed along with factors that should considered when choosing one of the other collectors.

When does the choice of a garbage collector matter to the user? For many applications it doesn't. That is, the application can perform within its specifications in the presence of garbage collection with pauses of modest frequency and duration. An example where this is not the case (when the serial collector is used) would be a large application that scales well to large number of threads, processors, sockets, and a large amount of memory.

Amdahl observed that most workloads cannot be perfectly parallelized; some portion is always sequential and does not benefit from parallelism. This is also true for the J2SE platform. In particular, virtual machines for the Java TM platform up to and including version 1.3.1 do not have parallel garbage collection, so the impact of garbage collection on a multiprocessor system grows relative to an otherwise parallel application.

The graph below models an ideal system that is perfectly scalable with the exception of garbage collection. The red line is an application spending only 1% of the time in garbage collection on a uniprocessor system. This translates to more than a 20% loss in throughput on 32 processor systems. At 10% of the time in garbage collection (not considered an outrageous amount of time in garbage collection in uniprocessor applications) more than 75% of throughput is lost when scaling up to 32 processors.

GC vs. Amdahl's law

This shows that negligible speed issues when developing on small systems may become principal bottlenecks when scaling up to large systems. However, small improvements in reducing such a bottleneck can produce large gains in performance. For a sufficiently large system it becomes well worthwhile to choose the right garbage collector and to tune it if necessary.

The serial collector will be adequate for the majority of applications. Each of the other collectors have some added overhead and/or complexity which is the price for specialized behavior. If the application doesn't need the specialized behavior of an alternate collector, use the serial collector. An example of a situation where the serial collector is not expected to be the best choice is a large application that is heavily threaded and run on hardware with a large amount of memory and a large number of processors. For such applications, we now make the choice of the throughput collector (see the discussion of ergonomics in section 2).

This document was written using the J2SE Platform version 1.5, on the Solaris TM Operating System (SPARC (R) Platform Edition) as the base platform, because it provides the most scalable hardware and software for the J2SE platform. However, the descriptive text applies to other supported platforms, including Linux, Microsoft Windows, and the Solaris Operating System (x86 Platform Edition), to the extent that scalable hardware is available. Although command line options are consistent across platforms, some platforms may have defaults different than those described here.

  1. Ergonomics

New in the J2SE Platform version 1.5 is a feature referred to here as ergonomics. The goal of ergonomics is to provide good performance from the JVM with a minimum of command line tuning. Ergonomics attempts to match the best selection of

  • Garbage collector

  • Heap size

  • Runtime compiler

for an application. This selection assumes that the class of the machine on which the application is run is a hint as to the characteristics of the application (i.e., large applications run on large machines). In addition to these selections is a simplified way of tuning garbage collection. With the throughput collector the user can specify goals for a maximum pause time and a desired throughput for an application. This is in contrast to specifying the size of the heap that is needed for good performance. This is intended to particularly improve the performance of large applications that use large heaps. The more general ergonomics is described in the document entitled “Ergonomics in the 1.5 Java Virtual Machine”. It is recommended that the ergonomics as presented in this latter document be tried before using the more detailed controls explained in this document.

Included in this document under the throughput collector are the ergonomics features that are provided as part of the new adaptive size policy. This includes the new options to specify goals for the performance of garbage collection and additional options to fine tune that performance.


  1. Generations

One strength of the J2SE platform is that it shields the developer from the complexity of memory allocation and garbage collection. However, once garbage collection is the principal bottleneck, it is worth understanding some aspects of this hidden implementation. Garbage collectors make assumptions about the way applications use objects, and these are reflected in tunable parameters that can be adjusted for improved performance without sacrificing the power of the abstraction.

An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.

Beginning with the J2SE Platform version 1.2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work.

The most important of these observed properties is infant mortality. The blue area in the diagram below is a typical distribution for the lifetimes of objects. The X axis is object lifetimes measured in bytes allocated. The byte count on the Y axis is the total bytes in objects with the corresponding lifetime. The sharp peak at the left represents objects that can be reclaimed (i.e., have "died") shortly after being allocated. Iterator objects, for example, are often alive for the duration of a single loop.



histogram with collections

Some objects do live longer, and so the distribution stretches out to the the right. For instance, there are typically some objects allocated at initialization that live until the process exits. Between these two extremes are objects that live for the duration of some intermediate computation, seen here as the lump to the right of the infant mortality peak. Some applications have very different looking distributions, but a surprisingly large number possess this general shape. Efficient collection is made possible by focusing on the fact that a majority of objects "die young".

To optimize for this scenario, memory is managed in generations, or memory pools holding objects of different ages. Garbage collection occurs in each generation when the generation fills up. Objects are allocated in a generation for younger objects or the young generation, and because of infant mortality most objects die there. When the young generation fills up it causes a minor collection.Minor collections can be optimized assuming a high infant mortality rate. The costs of such collections are, to the first order, proportional to the number of live objects being collected. A younggeneration full of dead objects is collected very quickly. Some surviving objects are moved to atenured generation. When the tenured generation needs to be collected there is a major collectionthat is often much slower because it involves all live objects.

The diagram below shows minor collections occurring at intervals long enough to allow many of the objects to die between collections. It is well-tuned in the sense that the young generation is large enough (and thus the period between minor collections long enough) that the minor collection can take advantage of the high infant mortality rate. This situation can be upset by applications with unusual lifetime distributions, or by poorly sized generations that cause collections to occur before objects have had time to die.

As noted in section 2 ergonomics nows makes different choice of the garbage collector in order to provide good performance on a variety of applications. The serial garbage collector is meant to be used by small applications. Its default parameters were designed to be effective for most small applications. The throughput garbage collector is meant to be used by large applications. The heap size parameters selected by ergonomics plus the features of the adaptive size policy are meant to provide good performance for server applications. These choices work well for many applications but do not always work. This leads to the central tenet of this document:

If the garbage collector has become a bottleneck, you may wish to customize the generation sizes. Check the verbose garbage collector output, and then explore the sensitivity of your individual performance metric to the garbage collector parameters.





The default arrangement of generations (for all collectors with the exception of the throughput collector) looks something like this.

space usage by generations

At initialization, a maximum address space is virtually reserved but not allocated to physical memory unless it is needed. The complete address space reserved for object memory can be divided into theyoung and tenured generations.

The young generation consists of eden plus two survivor spaces . Objects are initially allocated in eden. One survivor space is empty at any time, and serves as a destination of the next, copying collection of any live objects in eden and the other survivor space. Objects are copied between survivor spaces in this way until they are old enough to be tenured, or copied to the tenuredgeneration.

Other virtual machines, including the production virtual machine for the J2SE Platform version 1.2 for the Solaris Operating System, used two equally sized spaces for copying rather than one large eden plus two small spaces. This means the options for sizing the young generation are not directly comparable; see the Performance FAQ for an example.

A third generation closely related to the tenured generation is the permanent generation. Thepermanent generation is special because it holds data needed by the virtual machine to describe objects that do not have an equivalence at the Java language level. For example objects describing classes and methods are stored in the permanent generation.



3.1 Performance Considerations

    There are two primary measures of garbage collection performance. Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time. Throughput includes time spent in allocation (but tuning for speed of allocation is generally not needed.) Pauses are the times when an application appears unresponsive because garbage collection is occurring.

    Users have different requirements of garbage collection. For example, some consider the right metric for a web server to be throughput, since pauses during garbage collection may be tolerable, or simply obscured by network latencies. However, in an interactive graphics program even short pauses may negatively affect the user experience.

    Some users are sensitive to other considerations. Footprint is the working set of a process, measured in pages and cache lines. On systems with limited physical memory or many processes, footprint may dictate scalability. Promptness is the time between when an object becomes dead and when the memory becomes available, an important consideration for distributed systems, including remote method invocation (RMI).

    In general, a particular generation sizing chooses a trade-off between these considerations. For example, a very large young generation may maximize throughput, but does so at the expense of footprint, promptness, and pause times. young generation pauses can be minimized by using a small young generation at the expense of throughput. To a first approximation, the sizing of one generation does not affect the collection frequency and pause times for another generation.

    There is no one right way to size generations. The best choice is determined by the way the application uses memory as well as user requirements. For this reason the virtual machine's choice of a garbage collectior are not always optimal, and may be overridden by the user in the form of command line options, described below.

    3.2 Measurement

      Throughput and footprint are best measured using metrics particular to the application. For example, throughput of a web server may be tested using a client load generator, while footprint of the server might be measured on the Solaris Operating System using the pmap command. On the other hand, pauses due to garbage collection are easily estimated by inspecting the diagnostic output of the virtual machine itself.

      The command line argument -verbose:gc prints information at every collection. Note that the format of the -verbose:gc output is subject to change between releases of the J2SE platform. For example, here is output from a large server application:

        [GC 325407K->83000K(776768K), 0.2300771 secs] 
        [GC 325816K->83372K(776768K), 0.2454258 secs] 
        [Full GC 267628K->83769K(776768K), 1.8479984 secs]

      Here we see two minor collections and one major one. The numbers before and after the arrow

      325407K->83000K ( in the first line )


      indicate the combined size of live objects before and after garbage collection, respectively. After minor collections the count includes objects that aren't necessarily alive but can't be reclaimed, either because they are directly alive, or because they are within or referenced from the tenured generation. The number in parenthesis

      (776768K)( in the first line)


      is the total available space, not counting the space in the permanent generation, which is the total heap minus one of the survivor spaces. The minor collection took about a quarter of a second.

      0.2300771 secs (in the first line)

      The format for the major collection in the third line is similar. The flag -XX:+PrintGCDetails prints additional information about the collections. The additional information printed with this flag is liable to change with each version of the virtual machine. The additional output with the -XX:+PrintGCDetails flag in particular changes with the needs of the development of the Java Virtual Machine. An example of the output with -XX:+PrintGCDetails for the J2SE Platform version 1.5 using the serial garbage collector is shown here.

      [GC [DefNew: 64575K->959K(64576K), 0.0457646 secs] 196016K->133633K(261184K), 0.0459067 secs]]

      indicates that the minor collection recovered about 98% of the young generation,

      DefNew: 64575K->959K(64576K)

      and took about 46 milliseconds.

      0.0457646 secs

      The usage of the entire heap was reduced to about 51%

      196016K->133633K(261184K)

      and that there was some slight additional overhead for the collection (over and above the collection of the young generation) as indicated by the final time:

      0.0459067 secs

      The flag -XX:+PrintGCTimeStamps will additionally print a time stamp at the start of each collection.

      111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.0000505 secs]111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs] 26282K->2311K(32704K), 0.1293306 secs]

      The collection starts about 111 seconds into the execution of the application. The minor collection starts at about the same time. Additionally the information is shown for a major collection delineated by Tenured . The tenured generation usage was reduced to about 10%

      18154K->2311K(24576K)

      and took about .13 seconds.

      0.1290354 secs

      1. Sizing the Generations

      A number of parameters affect generation size. The following diagram illustrates the difference between committed space and virtual space in the heap. At initialization of the virtual machine, the entire space for the heap is reserved. The size of the space reserved can be specified with the -Xmx option. If the value of the -Xms parameter is smaller than the value of the -Xmx parameter, not all of the space that is reserved is immediately committed to the virtual machine. The uncommitted space is labeled "virtual" in this figure. The different parts of the heap ( permanent generation, tenured generation, and young generation) can grow to the limit of the virtual space as needed.

      Some of the parameters are ratios of one part of the heap to another. For example the parameter NewRatio denotes the relative size of the tenured generation to the younggeneration. These parameters are discussed below.

      options affecting sizing

      The discussion that follows regarding the growing and shrinking of the heap does not apply to the throughput collector. The resizing of the heap for the throughput collector is governed by the ergonomics discussed in section 5.2.2. The parameters that control the total size of the heap and the sizes of the generations do apply to the throughput collector.

      4.1 Total Heap

        Since collections occur when generations fill up, throughput is inversely proportional to the amount of memory available. Total available memory is the most important factor affecting garbage collection performance.

        By default, the virtual machine grows or shrinks the heap at each collection to try to keep the proportion of free space to live objects at each collection within a specific range. This target range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and -XX:MaxHeapFreeRatio=<maximum>, and the total size is bounded below by -Xms and above by -Xmx . The default parameters for the 32-bit Solaris Operating System (SPARC Platform Edition) are shown in this table:


        -XX:MinHeapFreeRatio=

        40


        -XX:MaxHeapFreeRatio=

        70

        -Xms

        3670k

        -Xmx

        64m


        Default values of heap size parameters on 64-bit systems have been scaled up by approximately 30%. This increase is meant to compensate for the larger size of objects on a 64-bit system.

        With these parameters if the percent of free space in a generation falls below 40%, the size of the generation will be expanded so as to have 40% of the space free, assuming the size of the generation has not already reached its limit. Similarly, if the percent of free space exceeds 70%, the size of the generation will be shrunk so as to have only 70% of the space free as long as shrinking the generation does not decrease it below the minimum size of the generation.

        Large server applications often experience two problems with these defaults. One is slow startup, because the initial heap is small and must be resized over many majorcollections. A more pressing problem is that the default maximum heap size is unreasonably small for most server applications. The rules of thumb for server applications are:



        Unless you have problems with pauses, try granting as much memory as possible to the virtual machine. The default size (64MB) is often too small.

        Setting -Xms and -Xmx to the same value increases predictability by removing the most important sizing decision from the virtual machine. On the other hand, the virtual machine can't compensate if you make a poor choice.

        Be sure to increase the memory as you increase the number of processors, since allocation can be parallelized.



        A description of other virtual machine options can be found at

        http://java.sun.com/docs/hotspot/VMOptions.html


        4.2 The Young Generation

          The second most influential knob is the proportion of the heap dedicated to the younggeneration. The bigger the young generation, the less often minor collections occur. However, for a bounded heap size a larger young generation implies a smallertenured generation, which will increase the frequency of major collections. The optimal choice depends on the lifetime distribution of the objects allocated by the application.

          By default, the young generation size is controlled by NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the young and tenured generation is 1:3. In other words, the combined size of the eden and survivor spaces will be one fourth of the total heap size.

          The parameters NewSize and MaxNewSize bound the young generation size from below and above. Setting these equal to one another fixes the young generation, just as setting -Xms and -Xmx equal fixes the total heap size. This is useful for tuning theyoung generation at a finer granularity than the integral multiples allowed by NewRatio.

          4.2.1 Young Generation Guarantee

          In an ideal minor collection the live objects are copied from one part of the younggeneration (the eden space plus the first survivor space) to another part of the younggeneration (the second survivor space). However, there is no guarantee that all the live objects will fit into the second survivor space. To ensure that the minor collection can complete even if all the objects are live, enough free memory must be reserved in the tenured generation to accommodate all the live objects. In the worst case, this reserved memory is equal to the size of eden plus the objects in non-empty survivor space. When there isn't enough memory available in the tenured generation for this worst case, a major collection will occur instead. This policy is fine for small applications, because the memory reserved in the tenured generation is typically only virtually committed but not actually used. But for applications needing the largest possible heap, an eden bigger than half the virtually committed size of the heap is useless: only major collections would occur. Note that the young generation guarantee applies only to serial collector . The throughput collector and the concurrent collector will proceed with a young generation collection, and if the tenured generation cannot accommodate all the promotions from the young generation, both generations are collected.



          If desired, the parameter SurvivorRatio can be used to tune the size of the survivor spaces, but this is often not as important for performance. For example, -XX:SurvivorRatio=6 sets the ratio between each survivor space and eden to be 1:6. In other words, each survivor space will be one eighth of the young generation (not one seventh, because there are two survivor spaces).

          If survivor spaces are too small, copying collection overflows directly into the tenuredgeneration. If survivor spaces are too large, they will be uselessly empty. At each garbage collection the virtual machine chooses a threshold number of times an object can be copied before it is tenured. This threshold is chosen to keep the survivors half full. An option, -XX:+PrintTenuringDistribution, can be used to show this threshold and the ages of objects in the new generation. It is also useful for observing the lifetime distribution of an application.

          Here are the default values for the 32-bit Solaris Operating System (SPARC Platform Edition):


          NewRatio

          2 ( client JVM: 8)


          NewSize

          2228k

          MaxNewSize

          Not limited

          SurvivorRatio

          32



          The maximum size of the young generation will be calculated from the maximum size of the total heap and NewRatio. The "not limited" default value for MaxNewSize means that the calculated value is not limited by MaxNewSize unless a value for MaxNewSize is specified on the command line.

          The rules of thumb for server applications are:

          First decide the total amount of memory you can afford to give the virtual machine. Then graph your own performance metric against young generation sizes to find the best setting.

          Unless you find problems with excessive major collection or pause times, grant plenty of memory to the young generation.

          Increasing the young generation becomes counterproductive at half the total heap or less (whenever the young generation guarantee cannot be met).

          Be sure to increase the young generation as you increase the number of processors, since allocation can be parallelized.





          1. Types of Collectors

          The discussion to this point has been about the serial collector. In the J2SE Platform version 1.5 there are three additional collectors. Each is a generational collector which has been implemented to emphasize the throughput of the application or low garbage collection pause times.

          1. The throughput collector: this collector uses a parallel version of the younggeneration collector. It is used if the -XX:+UseParallelGC option is passed on the command line. The tenured generation collector is the same as the serial collector.

          2. The concurrent low pause collector: this collector is used if the -Xincgc ™ or -XX:+UseConcMarkSweepGC is passed on the command line. The concurrent collector is used to collect the tenured generation and does most of the collection concurrently with the execution of the application. The application is paused for short periods during the collection. A parallel version of the young generation copying collector is used with the concurrent collector. The concurrent low pause collector is used if the option -XX:+UseConcMarkSweepGC is passed on the command line.

          3. The incremental (sometimes called train) low pause collector: this collector is used only if -XX:+UseTrainGC is passed on the command line. This collector has not changed since the J2SE Platform version 1.4.2 and is currently not under active development. It will not be supported in future releases. Please see the1.4.2 GC Tuning Document for information on this collector.

          Note that -XX:+UseParallelGC should not be used with -XX:+UseConcMarkSweepGC . The argument parsing in the J2SE Platform starting with version 1.4.2 should only allow legal combinations of command line options for garbage collectors, but earlier releases may not detect all illegal combinations and the results for illegal combinations are unpredictable.

          Always try the collector chosen by the JVM on your application before explicitly selecting another collector. Tune the heap size for your application and then consider what requirements of your application are not being met. Based on the latter, consider using one of the other collectors.

          5.1 When to Use the Throughput Collector

          Use the throughput collector when you want to improve the performance of your application with larger numbers of processors. In the serial collector garbage collection is done by one thread, and therefore garbage collection adds to the serial execution time of the application. The throughput collector uses multiple threads to execute a minor collection and so reduces the serial execution time of the application. A typical situation is one in which the application has a large number of threads allocating objects. In such an application it is often the case that a large younggeneration is needed.

          5.2 The Throughput Collector

            The throughput collector is a generational collector similar to the serial collector but with multiple threads used to do the minor collection. The major collections are essentially the same as with the serial collector. By default on a host with NCPUs, the throughput collector uses N garbage collector threads in the minor collection. The number of garbage collector threads can be controlled with a command line option (see below). On a host with 1 CPU the throughput collector will likely not perform as well as the serial collector because of the additional overhead for the parallel execution (e.g., synchronization costs). On a host with 2 CPUs the throughput collector generally performs as well as the serial garbage collector and a reduction in the minor garbage collector pause times can be expected on hosts with more than 2 CPUs.

            The throughput collector can be enabled by using command line flag -XX:+UseParallelGC. The number of garbage collector threads can be controlled with the ParallelGCThreads command line option ( -XX:ParallelGCThreads=<desired number>). If explicit tuning of the heap is being done with command line flags the size of the heap needed for good performance with the throughput collector is to first order the same as needed with the serial collector. Turning on the throughput collector should just make the minor collection pauses shorter. Because there are multiple garbage collector threads participating in the minor collection there is a small possibility of fragmentation due to promotions from theyoung generation to the tenured generation during the collection. Each garbage collection thread reserves a part of the tenured generation for promotions and the division of the available space into these "promotion buffers" can cause a fragmentation effect. Reducing the number of garbage collector threads will reduce this fragmentation effect as will increasing the size of the tenuredgeneration. 5.

            5.2.1 Generations in the throughput collector

            As mentioned earlier the arrangement of the generations is different in the throughput collector. That arrangement is shown in the figure below.






            5.2.2 Ergonomics in the throughput collector

            In the J2SE Platform version 1.5 the throughput collector will be chosen as the garbage collector on server class machines. The document Ergonomics in the 5 Java Virtual Machine discusses this selection of the garbage collector. For the throughput collector a new method of tuning has been added which is based on a desired behavior of the application with respect to garbage collection. The following command line flags can be used to specify the desired behavior in terms of goals for the maximum pause time and the throughput for the application.

            The maximum pause time goals is specified with the command line flag

            -XX:MaxGCPauseMillis=<nnn>

            This is interpreted as a hint to the throughput collector that pause times of <nnn> milliseconds or less are desired. By default there is no maximum pause time goal. The throughput collector will adjust the Java heap size and other garbage collection related parameters in an attempt to keep garbage collection pauses shorter than <nnn> milliseconds. These adjustments may cause the garbage collector to reduce overall throughput of the application and in some cases the desired pause time goal cannot be met. By default no maximum pause time goal is set.


            The throughput goal is measured in terms of the time spent doing garbage collection and the time spent outside of garbage collection (referred to as application time). The goal is specified by the command line flag

            -XX:GCTimeRatio=<nnn>

            The ratio of garbage collection time to application time is


            1 / (1 + <nnn>)

            For example -XX:GCTimeRatio=19 sets a goal of 5% of the total time for garbage collection. By default the goal for total time for garbage collection is 1%.

            Additionally, as an implicit goal the throughput collector will try to met the other goals in the smallest heap that it can.

            5.2.2.1 Priority of goals

            The goals are addressed in the following order

            • Maximum pause time goal

            • Throughput goal

            • Minimum footprint goal

            The maximum pause time goal is met first. Only after it is met is the throughput goal addressed. Similarly, only after the first two goals have been met is the footprint goal considered.

            5.2.2.2 Adjusting Generation Sizes

            The statistics (e.g., average pause time) kept by the collector are updated at the end of a collection. The tests to determine if the goals have been met are then made and any needed adjustments to the size of a generation is made. The exception is that explicit garbage collections (calls to System.gc()) are ignored in terms of keeping statistics and making adjustments to the sizes of generations.

            Growing and shrinking the size of a generation is done by increments that are a fixed percentage of the size of the generation. A generation steps up or down toward its desired size. Growing and shrinking are done at different rates. By default a generation grows in increments of 20% and shrinks in increments of 5%. The percentage for growing is controlled by the command line flag -XX:YoungGenerationSizeIncrement=<nnn > for the young generation and -XX:TenuredGenerationSizeIncrement=<nnn> for the tenured generation. The percentage by which a generation shrinks is adjusted by the command line flag -XX: AdaptiveSizeDecrementScaleFactor=<nnn >. If the size of an increment for growing is XXX percent, the size of the decrement for shrinking will be XXX / nnn percent.

            If the collector decides to grow a generation at startup, there is a supplemental percentage added to the increment. This supplement decays with the number of collections and there is no long term affect of this supplement. The intent of the supplement is to increase startup performance. There is no supplement to the percentage for shrinking.

            If the maximum pause time goal is not being met, the size of only one generation is shrunk at a time. If the pause times of both generations are above the goal, the size of the generation with the larger pause time is shrunk first.

            If the throughput goal is not being met, the sizes of both generations are increased. Each is increased in proportion to its respective contribution to the total garbage collection time. For example, if the garbage collection time of the young generation is 25% of the total collection time and if a full increment of the young generation would be by 20%, then the young generation would be increased by 5%.

            5.2.2.3 Heap Size

            If not otherwise set on the command line, the sizes of the initial heap and maximum heap are calculated based on the size of the physical memory. If phys_mem is the size of the physical memory on the platform, the initial heap size will be set to phys_mem / DefaultInitialRAMFraction. DefaultInitialRAMFraction is a command line option with a default value of 64. Similarly the maximum heap size will be set to phys_mem / DefaultMaxRAM. DefaultMaxRAMFraction has a default value of 4.

            5.2.3 Out-of-Memory Exceptions

            The throughput collector will throw an out-of-memory exception if too much time is being spent doing garbage collection. For example, if the JVM is spending more than 98% of the total time doing garbage collection and is recovering less than 2% of the heap, it will throw an out-of-memory expection. The implementation of this feature has changed in 1.5. The policy is the same but there may be slight differences in behavior due to the new implementation.

            5.2.4 Measurements with the Throughput Collector

            The verbose garbage collector output is the same for the throughput collector as with the serial collector.

            5.3 When to Use the Concurrent Low Pause Collector

              Use the concurrent low pause collector if your application would benefit from shorter garbage collector pauses and can afford to share processor resources with the garbage collector when the application is running. Typically applications which have a relatively large set of long-lived data (a large tenuredgeneration), and run on machines with two or more processors tend to benefit from the use of this collector. However, this collector should be considered for any application with a low pause time requirement. Optimal results have been observed for interactive applications with tenured generations of a modest size on a single processor.

              5.4 The Concurrent Low Pause Collector

                The concurrent low pause collector is a generational collector similar to the serial collector. The tenured generation is collected concurrently with this collector.

                This collector attempts to reduce the pause times needed to collect thetenured generation. It uses a separate garbage collector thread to do parts of the major collection concurrently with the applications threads. The concurrent collector is enabled with the command line option -XX:+UseConcMarkSweepGC. For each major collection the concurrent collector will pause all the application threads for a brief period at the beginning of the collection and toward the middle of the collection. The second pause tends to be the longer of the two pauses and multiple threads are used to do the collection work during that pause. The remainder of the collection is done with a garbage collector thread that runs concurrently with the application. The minor collections are done in a manner similar to the serial collector although multiple threads are used to do the collection. See "Parallel Minor Collection Options with the Concurrent Collector" below for information on using multiple threads with the concurrent low pause collector.

                The techniques used in the concurrent collector (for the collection of thetenured generation) are described at:

                http://research.sun.com/techrep/2000/abstract-88.html

                5.4.1 Overhead of Concurrency

                The concurrent collector trades processor resources (which would otherwise be available to the application) for shorter major collection pause times. The concurrent part of the collection is done by a single garbage collection thread. On an N processor system when the concurrent part of the collection is running, it will be using 1/ Nth of the available processor power. On a uniprocessor machine it would be fortuitous if it provided any advantage (see the section on Incremental mode for the exception to this statement). The concurrent collector also has some additional overhead costs that will take away from the throughput of the applications, and some inherent disadvantages (e.g., fragmentation) for some types of applications. On a two processor machine there is a processor available for applications threads while the concurrent part of the collection is running, so running the concurrent garbage collector thread does not "pause" the application. There may be reduced pause times as intended for the concurrent collector but again less processor resources are available to the application and some slowdown of the application should be expected. As N increases, the reduction in processor resources due to the running of the concurrent garbage collector thread becomes less, and the advantages of the concurrent collector become more.

                5.4.2 Young Generation Guarantee

                Prior to J2SE Platform version 1.5 the concurrent collector had to satisfy the young generation guarantee just as the serial collector does. Starting with J2SE Platform version 1.5 this is no longer true. The concurrent collector can recover if it starts a young generation collection and there is not enough space in the tenured generation to hold all the objects that require promotion from the young generation. This is similar to the throughput collector.

                5.4.3 Full Collections

                The concurrent collector uses a single garbage collector thread that runs simultaneously with the application threads with the goal of completing the collection of the tenured generation before it becomes full. In normal operation, the concurrent collector is able to do most of its work with the application threads still running, so only brief pauses are seen by the application threads. As a fall back, if the concurrent collector is unable to finish before the tenured generation fills up, the application is paused and the collection is completed with all the application threads stopped. Such collections with the application stopped are referred to as full collections and are a sign that some adjustments need to be made to the concurrent collection parameters.

                5.4.4 Floating Garbage

                A garbage collector works to find the live objects in the heap. Because application threads and the garbage collector thread run concurrently during a major collection, objects that are found to be alive by the garbage collector thread may become dead by the time collection finishes. Such objects are referred to as floating garbage. The amount of floating garbage depends on the length of the concurrent collection (more time for the applications threads to discard an object) and on the particulars of the application. As a rough rule of thumb try increasing the size of the tenuredgeneration by 20% to account for the floating garbage. Floating garbage is collected at the next garbage collection.

                5.4.5 Pauses

                The concurrent collector pauses an application twice during a concurrent collection cycle. The first pause is to mark as live the objects directly reachable from the roots (e.g., objects on thread stack, static objects and so on) and elsewhere in the heap (e.g., the young generation). This first pause is referred to as the initial mark. The second pause comes at the end of the marking phase and finds objects that were missed during the concurrent marking phase due to the concurrent execution of the application threads. The second pause is referred to as the remark.

                5.4.6 Concurrent Phases

                The concurrent marking occurs between the initial mark and the remark. During the concurrent marking the concurrent garbage collector thread is executing and using processor resources that would otherwise be available to the application. After the remark there is a concurrent sweeping phase which collects the dead objects. During this phase the concurrent garbage collector thread is again taking processor resources from the application. After the sweeping phase the concurrent collector sleeps until the start of the next major collection.

                5.4.7 Scheduling a collection

                With the serial collector a major collection is started when the tenured generation becomes full and all application threads are stopped while the collection is done. In contrast a concurrent collection should be started at a time such that the collection can finish before the tenured generation becomes full. There are several ways a concurrent collection can be started.

                The concurrent collector keeps statistics on the time remaining before the tenured generation is full (T-until-full) and on the time needed to do a concurrent collection (T-collect). When the T-until-full approaches T-collect, a concurrent collection is started. This test is appropriately padded so as to start a collection conservatively early.

                A concurrent collection will also start if the occupancy of the tenured generation grows above the initiating occupancy (i.e., the percentage of the current heap that is used before a concurrent collection is started). The initiating occupancy by default is set to about 68%. It can be set with the parameter CMSInitiatingOccupancyFraction which can be set on the command line with the flag

                -XX:CMSInitiatingOccupancyFraction=<nn>

                The value <nn> is a percentage of the current tenured generation size.

                5.4.8 Scheduling pauses

                The pauses for the young generation collection and the tenured generation collection occur independently. They cannot overlap, but they can occur in quick succession such that the pause from one collection immediately followed by one from the other collection can appear to be a single, longer pause. To avoid this the remark pauses for a concurrent collection are scheduled to be midway between the previous and next young generation pauses. The initial mark pause is typically too short to be worth scheduling.

                5.4.9 Incremental mode

                The concurrent collector can be used in a mode in which the concurrent phases are done incrementally. Recall that during a concurrent phase the garbage collector thread is using a processor. The incremental mode is meant to lessen the impact of long concurrent phases by periodically stopping the concurrent phase to yield back the processor to the application. This mode (referred to here as “i-cms”) divides the work done by concurrently by the collector into small chunks of time which are scheduled between young generation collections. This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors (e.g., 1 or 2).

                The concurrent collection cycle typically includes the following steps:

                • stop all application threads; do the initial mark; resume all application threads

                • do the concurrent mark (uses one procesor for the concurrent work)

                • do the concurrent pre-clean (uses one processor for the concurrent work)

                • stop all application threads; do the remark; resume all application threads

                • do the concurrent sweep (uses one processor for the concurrent work)

                • do the concurrent reset (uses one processor for the concurrent work)

                Normally, the concurrent collector uses one processor for the concurrent work for the entire concurrent mark phase, without (voluntarily) relinquishing it. Similarly, one processor is used for the entire concurrent sweep phase, again without relinquishing it. This processor utilization can be too much of a disruption for applications with pause time constraints, particularly when run on systems with just one or two processors. i-cms solves this problem by breaking up the concurrent phases into short bursts of activity, which are scheduled to occur mid-way between minor pauses.

                I-cms uses a "duty cycle" to control the amount of work the concurrent collector is allowed to do before voluntarily giving up the processor. The duty cycle is the percentage of time between young generation collections that the concurrent collector is allowed to run. I-cms can automatically compute the duty cycle based on the behavior of the application (the recommended method), or the duty cycle can be set to a fixed value on the command line.

                5.4.9.1 Command line

                The following command-line options control i-cms (see below for recommendations for an initial set of options):

                -XX:+CMSIncrementalMode default: disabled

                This flag enables the incremental mode. Note that the concurrent collector must be enabled (with -XX:+UseConcMarkSweepGC) for this option to work.

                -XX:+CMSIncrementalPacing default: disabled

                This flag enables automatic adjustment of the incremental mode duty cycle based on statistics collected while the JVM is running.

                -XX:CMSIncrementalDutyCycle=<N> default: 50

                This is the percentage (0-100) of time between minor collections that the concurrent collector is allowed to run. If CMSIncrementalPacing is enabled, then this is just the initial value.

                -XX:CMSIncrementalDutyCycleMin=<N> default: 10

                This is the percentage (0-100) which is the lower bound on the duty cycle when CMSIncrementalPacing is enabled.

                -XX:CMSIncrementalSafetyFactor=<N> default: 10

                This is the percentage (0-100) used to add conservatism when computing the duty cycle.

                -XX:CMSIncrementalOffset=<N> default: 0

                This is the percentage (0-100) by which the incremental mode duty cycle is shifted to the right within the period between minor collections.

                -XX:CMSExpAvgFactor=<N> default: 25

                This is the percentage (0-100) used to weight the current sample when computing exponential averages for the concurrent collection statistics.

                5.4.9.2 Recommended Options for i-cms

                When trying i-cms, we recommend the following as an initial set of command line options:

                -XX:+UseConcMarkSweepGC \

                -XX:+CMSIncrementalMode \

                -XX:+CMSIncrementalPacing \

                -XX:CMSIncrementalDutyCycleMin=0 \

                -XX:+CMSIncrementalDutyCycle=10 \

                -XX:+PrintGCDetails \

                -XX:+PrintGCTimeStamps \

                -XX:-TraceClassUnloading


                The first three options enable the concurrent collector, i-cms, and i-cms automatic pacing. The next two set the minimum duty cycle to 0 and the initial duty cycle to 10, since the default values (10 and 50, respectively) are too large for a number of applications. The last three options cause diagnostic information on the collection to be written to stdout, so that the behavior of i-cms can be seen and later analyzed.

                5.4.9.3 Basic Troubleshooting

                The i-cms automatic pacing feature uses statistics gathered while the program is running to compute a duty cycle so that concurrent collections complete before the heap becomes full. However, past behavior is not a perfect predictor of future behavior and the estimates may not always be accurate enough to prevent the heap from becoming full. If too many full collections occur, try the following steps, one at a time:

                Increase the safety factor:

                -XX:CMSIncrementalSafetyFactor=<N>

                Increase the minimum duty cycle:

                -XX:CMSIncrementalDutyCycleMin=<N>

                Disable automatic pacing and use a fixed duty cycle:

                -XX:-CMSIncrementalPacing -XX:CMSIncrementalDutyCycle=<N>

                5.4.10 Measurements with the Concurrent Collector

                Below is output for -verbose:gc with -XX:+PrintGCDetails (some details have been removed). Note that the output for the concurrent collector is interspersed with the output from the minor collections. Typically many minor collections will occur during a concurrent collection cycle. The CMS-initial-mark: indicates the start of the concurrent collection cycle. The CMS-concurrent-mark: indicates the end of the concurrent marking phase and CMS-concurrent-sweep: marks the end of the concurrent sweeping phase. Not discussed before is the precleaning phase indicated by CMS-concurrent-preclean:. Precleaning represents work that can be done concurrently and is in preparation for the remark phase CMS-remark. The final phase is indicated by the CMS-concurrent-reset: and is in preparation for the next concurrent collection.

                [GC [1 CMS-initial-mark: 13991K(20288K)] 14103K(22400K), 0.0023781 secs]

                [GC [ DefNew: 2112K->64K(2112K), 0.0837052 secs] 16103K->15476K(22400K), 0.0838519 secs]

                ...

                [GC [DefNew: 2077K->63K(2112K), 0.0126205 secs] 17552K->15855K(22400K), 0.0127482 secs]

                [CMS-concurrent-mark: 0.267/0.374 secs]

                [GC [DefNew: 2111K->64K(2112K), 0.0190851 secs] 17903K->16154K(22400K), 0.0191903 secs]

                [CMS-concurrent-preclean: 0.044/0.064 secs]

                [GC[1 CMS-remark: 16090K(20288K)] 17242K(22400K), 0.0210460 secs]

                [GC [DefNew: 2112K->63K(2112K), 0.0716116 secs] 18177K->17382K(22400K), 0.0718204 secs]

                [GC [DefNew: 2111K->63K(2112K), 0.0830392 secs] 19363K->18757K(22400K), 0.0832943 secs]

                ...

                [GC [DefNew: 2111K->0K(2112K), 0.0035190 secs] 17527K->15479K(22400K), 0.0036052 secs]

                [CMS-concurrent-sweep: 0.291/0.662 secs]

                [GC [DefNew: 2048K->0K(2112K), 0.0013347 secs] 17527K->15479K(27912K), 0.0014231 secs]

                [CMS-concurrent-reset: 0.016/0.016 secs]

                [GC [DefNew: 2048K->1K(2112K), 0.0013936 secs] 17527K->15479K(27912K), 0.0014814 secs]



                The initial mark pause is typically short relative to the minor collection pause time. The times of the concurrent phases (concurrent mark, concurrent precleaning, and concurrent sweep) may be relatively long (as in the example above) when compared to a minor collection pause but the application is not paused during the concurrent phases. The remark pause is affected by the specifics of the application (e.g., a higher rate of modifying objects can increase this pause) and the time since the last minor collection (i.e., more objects in the young generation may increase this pause).


                1. Other Considerations

                For most applications the permanent generation is not relevant to garbage collector performance. However, some applications dynamically generate and load many classes. For instance, some implementations of JSP TMpages do this. If necessary, the maximum permanent generation size can be increased with MaxPermSize.

                Some applications interact with garbage collection by using finalization and weak/soft/phantom references. These features can create performance artifacts at the Java programming language level. An example of this is relying on finalization to close file descriptors, which makes an external resource (descriptors) dependent on garbage collection promptness. Relying on garbage collection to manage resources other than memory is almost always a bad idea.

                Another way applications can interact with garbage collection is by invoking full garbage collections explicitly, such as through the System.gc() call. These calls force major collection, and inhibit scalability on large systems. The performance impact of explicit garbage collections can be measured by disabling explicit garbage collections using the flag -XX:+DisableExplicitGC.

                One of the most commonly encountered uses of explicit garbage collection occurs with RMI's distributed garbage collection (DGC). Applications using RMI refer to objects in other virtual machines. Garbage can't be collected in these distributed applications without occasional local collection, so RMI forces periodic full collection. The frequency of these collections can be controlled with properties. For example,

                java -Dsun.rmi.dgc.client.gcInterval=3600000 
                -Dsun.rmi.dgc.server.gcInterval=3600000 ...

                specifies explicit collection once per hour instead of the default rate of once per minute. However, this may also cause some objects to take much longer to be reclaimed. These properties can be set as high as Long.MAX_VALUE to make the time between explicit collections effectively infinite, if there is no desire for an upper bound on the timeliness of DGC activity.


                The Solaris 8 Operating System supports an alternate version of libthread that binds threads to light-weight processes (LWPs) directly. Some applications can benefit greatly from the use of the alternate libthread. This is a potential benefit for any threaded application. To try this, set the environment variable LD_LIBRARY_PATH to include /usr/lib/lwp before launching the virtual machine. The alternate libthread is the default libthread in the Solaris 9 Operating System.

                Soft references are cleared less aggressively in the server virtual machine than the client. The rate of clearing can be slowed by increasing the parameter SoftRefLRUPolicyMSPerMB with the command line flag -XX:SoftRefLRUPolicyMSPerMB=10000. SoftRefLRUPolicyMSPerMB is a measure of the time that a soft reference survives for a given amount of free space in the heap. The default value is 1000 ms per megabyte. This can be read to mean that a soft reference will survive (after the last strong reference to the object has been collected) for 1 second for each megabyte of free space in the heap. This is very approximate.



                1. Conclusion

                Garbage collection can become a bottleneck in different applications depending on the requirements of the applications. By understanding the requirements of the application and the garbage collection options, it is possible to minimize the impact of garbage collection.

                1. Other Documentation

                  8.1 Example of Output

                The GC output examples document contains examples for different types of garbage collector behavior. The examples show the diagnostic output from the garbage collector and explain how to recognize various problems. Examples from different collectors are included.

                8.2 Frequently Asked Questions

                  A FAQ is included that contains answers to specific questions. The level of detail in the FAQ is generally greater than in this tuning document.

                  As used on the web site, the terms "Java Virtual Machine" and "JVM" mean a virtual machine for the Java platform.


NIO Tutorial

posted Apr 27, 2011, 9:41 PM by Kuwon Kang

Getting started with new I/O (NIO)
The new input/output (NIO) library, introduced with JDK 1.4, provides high-speed, block-oriented I/O in standard Java code. This hands-on tutorial covers the NIO library in great detail, from the high-level concepts to under-the-hood programming detail. You'll learn about crucial I/O elements like buffers and channels, and examine how standard I/O works in the updated library.
Read on-line
Source code

Parts Of JVM And JVM Architecture Diagram?

posted Apr 27, 2011, 5:08 PM by Kuwon Kang

JVM is the heart of any Java based Application Server. We face most of the issues deu to incorrect JVM tunning. It is very important to understand the Overall architecture of the JVM in order to trouble shoot different JVM tunning related issues.Here we are going to discuss the Architecture and the Major parts of a Java Process And the Java Heap Division.
.
The Following Diagram is just a basic overview of a Java Process in a 2 GB process Size Machine. Usually in 32 bit Windows Operating Systems the default process size will be 2 GB (In Unix based 64 bit operating Systems it can be 4GB or more). So i draw the following Diagram of Java Process to explain the Java Process partitions in a 2Gb process size machine.
Java Process Architecture Diagram

Java Process Architecture Diagram

In the above diagram we will find different partitions of a Java Process. Please compare the above diagram with below descriptions.
.
1). Just for Example we can see that Process Size is 2048 MB (2GB)
2). The Java Heap Size is 1024MB (means 1GB)   -Xmx1024m
3). Native Space = ( ProcessSize – MaxHeapSize – MaxPermSize) It means around 768 MB of Native Space.
4). MaxPermSpace is around -XX:MaxPermSize=256m
5). Young Generation Space is around    40% of Maximum Java Heap.

What Are these Different Parts?

Eden Space:

Eden Space is a Part of Java Heap where the JVM initially creates any objects, where most objects die and quickly are cleanedup by the minor Garbage Collectors (Note: Full Garbage Collection is different from Minor Garbage Collection). Usually any new objects created inside a Java Method go into Eden space and the objects space is reclaimed once the method execution completes. Where as the Instance Variables of a Class usually lives longer until the Object based on that class gets destroyed. When Eden fills up it causes a minor collection, in which some surviving objects are moved to an older generation.

Survivor Spaces:

Eden Sapce has two Survivor spaces. One survivor space is empty at any given time. These Survivor Spaces serves as the destination of the next copying collection of any living objects in eden and the other survivor space.
The parameter SurvivorRatio can be used to tune the size of the survivor spaces.
-XX:SurvivorRatio=6 sets the ratio between each survivor space and eden to be 1:6
If survivor spaces are too small copying collection overflows directly into the tenured generation.

Young Generation: (-XX:MaxNewSize)

Till JDK1.3 and 1.4 we used to set the Young Generation Size using -XX:MaxNewSize. But from JDK1.4 onwards we set the YoungGeneration size using (-Xmn) JVM option.
Young Generation size is controlled by NewRatio.  It means setting -XX:NewRatio=3 means that the ratio between the Old Generation and the Young Generation is  1:3
.
Similarly -XX:NewRatio=8 means that 8:1 ratio of tenured and young generation.
NewRatio: NewRatio is actually the ratio between the (YoungGenaration/Old Generations) has default values of 2 on Sparc , 12 on client Intel, and 8 everywhere else.
NOTE: After JDK 1.4 The Young Generation Size can be set using  (-Xmn) as well.

Virtual Space-1: (MaxNewSize – NewSize)

The First Virtual Space is actually shows the difference between the -XX:NewSize and -XX:MaxNewSize.  Or we can say that it is basically a difference between the Initial Young Size and the Maximum Young Size.

Java Heap Area: (-Xmx and -Xms)

Java Heap is a Memory area inside the Java Process which holds the java objects.  Java Heap is a combination of Young Generation Heap and Old Generation Heap. We can set the Initial Java Heap Size using -Xms JVM parameter similarly if we want to set the Maximum Heap Size then we can use -Xmx JVM parameter to define it.

Example:
-Xmx1024m —> Means Setting the Maximum limit of Heap as 1 GB
-Xms512m —> Means setting Java Heap Initial Size as 512m
.
NOTE-1): It is always recommended to set the Initial and the Maximum Heap size values as same for better performance.
NOTE-2): The Theoretical limitation of Maximum Heap size for a 32 bit JVM is upto 4GB. Because of the Memory Fragmentation, Kernel Space Addressing, Swap memory usages and the Virtual Machine Overheads are some factors JVM does not allow us to allocate whole 4GB memory for Heap in a 32 bit JVM. So usually on 32-bit Windows Operating Systems the Maximum can be from 1.4 GB to 1.6 GB.
.
If we want a Larger memory allocation according to our application requirement then we must choose the 64-bit operating systems with 64 bit JVM. 64-bit JVM provides us a larger address space. So we can have much larger Java Heap  with  the increased number of Threads allocation area. Based on the Nature of your Operating system in a 64 bit JVM you can even set the Maximum Heap size upto 32GB.
Example:        -Xms32g -Xmx32g -Xmn4g

Virtual Space-2: (MaxHeapSize – InitialHeapSize)

The Second Virtual Space is actually the Difference between the Maximum Heap size (-Xmx)and the Initial Heap Size(-Xms). This is called as virtual space because initially the JVM will allocate the Initial Heap Size and then according to the requirement the Heap size can grow till the MaxHeapSize.
.

PermGen Space: (-XX:MaxPermSize)

PermGen is a non-heap memory area where the Class Loading happens and the JVM allocates spaces for classes, class meta data,  java methods and the reference Objects here. The PermGen is independent from the Heap Area. It can be resized according to the requirement using -XX:MaxPermSize and -XX:PermSize  JVM Options. The Garbage collection happens in this area of JVM Memory as well. The Garbage collection in this area is called as “Class GC”. We can disable the Class Garbage Collection using the JVM Option -noclassgc. if  ”-noclassgc” Java Option is added while starting the Server. In that case the Classes instances which are not required will not be Garbage collected.

Native Area:

Native Memory is an area which is usually used by the JVM for it’s internal operations and to execute the JNI codes. The JVM Uses Native Memory for Code Optimization and for loading the classes and libraries along with the intermediate code generation.
The Size of the Native Memory depends on the Architecture of the Operating System and the amount of memory which is already commited to the Java Heap. Native memory is an Process Area where the JNI codes gets loaded or JVM Libraries gets loaded or the native Performance packs and the Proxy Modules gets loaded.
There is no JVM Option available to size the Native Area. but we can calculate it approximately using the following formula:
NativeMemory = (ProcessSize – MaxHeapSize – MaxPermSize)

Difference between Externalizable and Serializable in Java

posted Mar 27, 2011, 4:00 PM by Kuwon Kang

Difference between Externalizable and Serializable in Java

One obvious difference that Serializable is a marker interface and doesn't contain any methods whereas Externalizable interface contains two methods:writeExternal(ObjectOutput) and readExternal(ObjectInput). But, the main difference between the two is that Externalizable interface provides complete control to the class implementing the interface over the object serialization process whereas Serializable interface normally uses default implementation to handle the object serialization process.

While implementing Serializable, you are not forced to define any method as it's a marker interface. However, you can use the writeObject or readObjectmethods to handle the serilaization process of complex objects. But, while implementing Externalizable interface, you are bound to define the two methods:writeExternal and readExternal and all the object serialization process is solely handled by these two methods only.

In case of Serializable interface implementation, state of Superclasses are automatically taken care by the default implementation whereas in case ofExternalizable interface the implementing class needs to handle everything on its own as there is no default implementation in this case.

Example Scenario: when to use what?

If everything is automatically taken care by implementing the Serializableinterface, why would anyone like to implement the Externalizable interface and bother to define the two methods? Simply to have the complete control on the process. OKay... let's take a sample example to understand this. Suppose we have an object having hundreds of fields (non-transient) and we want only few fields to be stored on the persistent storage and not all. One solution would be to declare all other fields (except those which we want to serialize) as transient and the default Serialization process will automatically take care of that. But, what if those few fields are not fixed at design tiime instead they are conditionally decided at runtime. In such a situation, implementing Externalizable interface will probably be a better solution. Similarly, there may be scenarios where we simply don't want to maintain the state of the Superclasses (which are automatically maintained by the Serializableinterface implementation).

Which has better performance - Externalizable or Serializale?

In most of the cases (or in all if implemented correctly), Externalizable would be more efficient than Serializable for the simple reason that in case of Externalizable the entire process of marshalling, un-marshalling, writing to the stream, and reading back from stream, etc. is under your control i.e., you got to write the code and you can of course choose the best way depending upon the situaton you are in. In case of Serializable, this all (or at least most of it) is done implicitly and the internal implementation being generic to support any possible case, can ofcourse not be the most efficient. The other reason for Serializable to be less efficient is that in this case several reflective calls are made internally to get the metadata of the class. Of course, you would not need any such call is needed in case Externalizable.

However, the efficiency comes at a price. You lose flexibility because as soon as your class definition changes, you would probably need to modify your Externaliable implementation as well. Additionally, since you got to write more code in case Externalizable, you increase the chances of adding more bugs in your application.

Another disadvantage of Externalizable is that you got to have the class to interpret the stream as the stream format is an opaque binary data. Normal Serialization adds field names and types (this why reflective calls are needed here) into the stream, so it's possible to re-construct the object even without the availability of the object's class. But, you need to write the object reconstruction code yourself as Java Serialization doesn't provide any such API at the moment. The point is that in case of Serialzable you can at least write your code as the stream is enriched with field names and types whereas in case Externalizable the stream contains just the data and hence you can't unless you use the class definition. As you can see Serializable not only makes many reflective calls, but also puts the name/type info into the stream and this would of course take some time making Serialzable slower than the corresponding Externalizable process where you got to stuff only the data into the stream.

Note: One of our visitors, Manish asked this question in response to the post on Externalizable interface. Thanks Manish for bringing this up and I hope this article will help you in understanding the differences between the two interfaces. Keep visiting/posting! 

HP JVM Memory Architecture

posted Aug 2, 2010, 6:03 PM by Kuwon Kang


»Previous topic: Using Java™ 2 JNI on HP-UX
»Next topic: Debugging
»Back to table of contents
Table of contents
»Determine your requirements
»Memory layout under HP-UX 11.0 PA-RISC
»Additional memory available under HP-UX 11i PA-RISC
»Allocating physical memory and swap in the Java™ heap
»Diagnozing Memory Leaks
»Useful key command line options for allocating memory
»Application dependent considerations when using large heap size HP-UX 11i PA-RISC
»Expanding heap size in native applications on PA-RISC HP-UX 11.11 and later releases
»Expanding heap size in native applications on Integrity HP-UX 11.23 and later releases
»Expanding heap size in HP-UX PA-RISC
»Expanding heap size in HP-UX Integrity

Determine your requirements

It's important to understand the real requirements of your application. We recommend that you perform sizing tests before deployment with a realistic load, while monitoring with -Xverbosegc and a tool like GlancePlus. Learn more about GlancePlus and the -Xverbosegc option.

Our Java™ Performance Tuning website has much valuable information on profiling and performance tools.

In addition, HP-UX patches may be needed for expanding memory size. Read the release notes for your SDK release to determine if you need a HP-UX patch.

Memory layout under HP-UX 11.0 (PA-RISC only)

In the HP-UX 32-bit process memory layout, there are four "spaces" used in the runtime:

---------- 0x00000000 
| text | 
---------- 0x40000000 
| data | 
---------- 0x80000000 
| sh mem | 
---------- 0xc0000000 
| sh mem | 
---------- 0xffffffff

Application "text," the code in the executable, goes in text space.

Shared libraries get mapped into the shared memory, usually above 0xc0000000 but if there are many shared libraries or a lot of shared memory in use they will creep down into the 0x80000000-0xbfffffff range.

So in a normal executable, writable data is in the range 0x40000000 to 0x7fffffff. C heap starts in the 0x4-------'s, and mmap-ed areas start in the 0x7's and work back down. Thread stacks also start in the 0x7's, and get allocated at lower and lower addresses as more threads get allocated.

Almost all of the native code for the JDK is in shared libraries. There is just a very small amount of code in the launcher down in text space. We take advantage of otherwise unused space in text space by linking with EXEC_MAGIC. With EXEC_MAGIC, on HP-UX 11.0, our memory layout looks like this.

---------- 0x00000000 
| text | 
---------- the java launcher uses memory up to about 0x00008000; 
           it is writeable under EXEC_MAGIC. 
| data | 
---------- 0x80000000 
| sh mem | 
---------- 0xc0000000 
| sh mem | 
---------- 0xffffffff

The PA1.1 binary is not EXEC_MAGIC because the jit in Classic is not compatible with EXEC_MAGIC. The PA2.0 launcher binary is EXEC_MAGIC. With EXEC_MAGIC, we have all the space from around 0x00008000until 0x80000000 as writeable data area. Now the C heap will start way down there. This allows us to allocate a Java™ heap larger than 1GB. However, you should be aware of certain considerations which might be application dependent. These are described in Application dependent considerations when using large heap size HP-UX 11i PA-RISC.

Additional memory available under HP-UX 11i (PA-RISC only)

With HP-UX 11i, Java™ supports a Java™ heap as large as 3.8GB. Space previously reserved for shared memory can be used for process private writeable data. This should be used with caution, however, as a Java™ heap this large is subject to lengthy pauses for garbage collection. Due to the design of HP-UX, the heap is not one contiguous mmap. When the heap is getting this large, either the new space, set with -Xmn, or old space, set by the difference of Xmx and Xmn, must be 900MB or less.

-------------- 0x00000000 
| text | 
-------------- around 0x00008000 
| data | 
-------------- 0x80000000 
| data | 
-------------- 0xc0000000 
| data+shmem | 
-------------- 0xffffffff

When using a large Java™ heap, you should be aware of certain considerations which might be application dependent. These are described in Application dependent considerations when using large heap size HP-UX 11i PA-RISC. In addition, be aware that if you need to load extra native libraries at runtime, you need to carefully test to ensure that you haven't used up all your address space for Java™ heap so that there won't be enough shared memory left to load another shared library.

Allocating physical memory and swap in the Java™ heap

The Hotspot JVM uses mmap to reserve memory for the various spaces within the Java™ heaps. These memory mapped regions are created during initialization and are sized so that they can hold the maximum size of heap specified with the -Xmx command line option.

Normally the HP-UX reserves swap space for the whole of these memory mapped heap regions when they are first mapped. However, in order to conserve system swap resources, the JVM, by default, maps these regions using the mmap MAP_NORESERVE flag (see man 2 mmap for details). When using this option, no swap space is reserved when the region is first created, swap will only be reserved for each page as and when it is first used. In both cases physical memory pages will only be allocated when the pages are first used.

The JVM doesn't necessarily use all the space that it's reserved for the Java™ heap. On initialization it commits a portion of this memory, the size of which is controlled by the -Xms option. The size of this committed area (as shown by the capacity value in the -Xverbosegc output) can vary between this minimum value and the maximum allowable size (controlled by the -Xmx option), as the amount of retained objects in the heap increases and decreases.

Starting with SDK 1.4.1.05, the way that physical memory and swap space are allocated within the Java™ heap has changed. The JVM now ensures that swap is reserved for the whole of the currently committed area of the heap by touching each memory page. It also now explicitly releases this swap if the size of the committed area decreases. (These decreases are not very common.) Prior to 1.4.1.05 the JVM did neither of these.

Note that under HP-UX, when memory pages are touched, physical memory is allocated and swap is reserved as well. As a result, in some applications you may now see a larger memory footprint. When you monitor your Java™ processes with Glance or other tools, you are likely to see a higherRSS memory usage, especially on startup because the memory is being allocated earlier than before. In some cases, your application startup may be slightly slower. It is the physical memory allocation that shows up in the increased RSS values, not the swap reservation.

Useful key command line options for allocating memory

Below are three important command line options with regard to allocating memory, and some examples of how to use them. (For a complete list of HotSpot JVM options, refer to the Programmer's Guide chapter on HotSpot technology.

-Xms<size> 
Specifies the initial size of the committed memory within the Java™ heap. It also specifies the minimum size of the committed area. The value must be a multiple of 1024 greater than 1MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes.

For example, -Xms6291456, -Xms6144k, -Xms1500M

-Xmx<size> 
Specifies the maximum size of the Java™ heap. The value must a multiple of 1024 greater than 2MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes.

For example, -Xmx83886080, -Xmx81920k, -Xmx1500M

You will notice the increased memory footprint described in the previous section most if you set -Xms to the same value as -Xmx. This is because the whole of the associated heap spaces are committed and the JVM reserves swap for them by touching every page. In this case it would be better to use the -XX:+ForceMmapReserved option because it is more efficient.

-XX:+ForceMmapReserved 
Tells the JVM to reserve the swap space for all large memory regions used by the JVM (Java™ heap). This effectively removes the MAP_NORESERVE flag from the mmap call used to map the Java™ heaps and ensures that swap is reserved for the full memory mapped region when it is first created. When using this option the JVM no longer needs to touch the memory pages within the committed area to reserve the swap and as a result , no physical memory is allocated until the page is actually used by the application.

Examples

-Xmx1500M 
The initial Java™ heap memory commitment will be relatively small. This approach maximizes the availability of system swap across your application machine. In SDK 1.4.1.05, the initial JVM footprint increases by about 22Mb; in SDK 1.4.2.00, it increases by approximately 33Mb.

-Xmx1500M -Xms1500M 
The initial Java™ heap memory commitment will be 1500Mb. You may notice a pause during startup, even on a fast machine, as the memory pages are touched. In this case it would be better to use the -XX:+ForceMmapReserved option because it uses memory more efficiently. If your application typically uses less memory than this, setting -Xmx and -Xms to the same value does not make the most efficient use of system shared memory. You may want to use a smaller value for -Xms instead.

-Xmx1500M -XX:+ForceMmapReserved 
There will be no pause corresponding to initial memory commitment. 1500Mb of swap is reserved for the Java™ application. This swap cannot be shared with any other processes on the system.

Application dependent considerations when using large heap size HP-UX 11i PA-RISC

Thread stacks and C heap are allocated from the same address space as the Java™ heap, so if you set the Java™ heap too large, new threads may not start correctly. Or other parts of the runtime or native methods may suddenly fail if the C heap cannot allocate a new page. An application may start up correctly with a 1.7GB heap, but this does not necessarily mean it's going to work correctly.

For example, if you use a 1MB stack size (-Xss1m), and there are about 80 threads in the process, you will have 80MB for stacks. If you have native libraries, you would probably add another 64MB for C heap. You have now used a total of 144MB of your heap for stacks and C heap, so this address space is not available for Java™ heap.

Because all programs have varying C heap requirements and have varying numbers of threads, it's difficult to ascertain what the effect will be of running the application at its limit. It's important to understand the real requirements of your application. We recommend that you perform sizing tests before deployment with a realistic load, while monitoring with the -Xverbosegc option and a tool like GlancePlus.

Expanding heap size in native applications on PA-RISC HP-UX 11.11 and later releases

If you embed libjvm in a native 32-bit application and wish to use a large Java™ heap, you need to ensure that enough private data space is enabled. You can expand your available memory space from 1GB to around 1.7GB on HP-UX 11.11 and later releases by using HP-UX's EXEC_MAGIC; link your executable with "-N". On HP-UX 11.11, you may need to install HP-UX patch PHKL_35564 (or its superseded patch) to get a larger Java™ heap. Releases after 11.11 do not require any patches for this feature. Use the commands shown below to get the larger Java™ heap.

  • For Java™ heap greater than 1500MB:

    chatr +q3p enable <executable name>

  • For Java™ heap greater than 2400MB:

    chatr +q3p enable +q4p enable <executable name>

Also refer to Application dependent considerations when using large heap size HP-UX 11i PA-RISC..

Expanding heap size in native applications on Integrity HP-UX 11.23 and later releases

If you embed libjvm in a native 32-bit application and wish to use a large Java™ heap, you need to ensure that enough private data space is enabled. You can expand your available memory space from 1GB to around 1.7GB on HP-UX 11.23 and later releases by using HP-UX's EXEC_MAGIC; link your executable with "-N".

  • For Java™ heap greater than 1500MB:

    chatr +as mpas <executable name>;

Expanding heap size in HP-UX PA-RISC1

Hotspot supports heaps larger than 3GB on HP-UX PA-RISC. In theory, a process can have 3.8GB address space. However, the address space available as java heap is smaller than 3.8GB due to reserved address space for primordial stack, other private segment in JVM like permanent generation, code cache and interpreter, and other reserved address range and alignment.

When Java™ is invoked from the command line on HP-UX PA-RISC, Hotspot automatically chooses an appropriate executable. This is how Hotspot chooses the executable for SDK 1.4.2.09 and JDK 5.0.01 and older releases:

  • For heaps less than 1500MB, the executable is 'java'.
  • For heaps greater than or equal to 1500MB, and less than 2400MB the executable is 'java_q3p'.
  • For heaps of 2400MB to 3.8GB, the executable is 'java_q4p'.

This is how Hotspot chooses the executable for SDK 1.4.2.10, JDK 5.0.02, JDK 6.0.00 and later releases:

  • For (heaps + max perm + stack limit) less than 1600MB, the executable is 'java'.
  • For heaps (heaps + max perm + stack limit) greater than or equal to 1600MB, and less than 2500MB the executable is 'java_q3p'.
  • For (heaps + max perm + stack limit) of 2500MB to 3.8GB, the executable is 'java_q4p'.

Note: For releases after SDK 1.4.2.10 and JDK 5.0.02, when using CMS, the estimated threshold for switching is about 3% less than the values above, assuming default values are used for NewSize,MaxNewSizeNewRatioCMSMarkStackSize and CMSRevisitStackSize.

(heaps mean total size of java heap determined by -Xmx option. max perm is perm generation size limit determined by -XX:MaxPermSize=... option. stack limit is primordial stack size limit which is initialized by kernel parameter maxssiz unless it is changed explicitly by rlimit(2), shell's 'ulimit -s', etc.)

You do not need to invoke these programs directly. Just invoke 'java' as usual, and the appropriate program will be run for you.

In addition, be aware that if you wish to use very large heaps, because of segmentation in the HP-UX virtual address space, when the Java™ heap is larger than 3000MB, either new space (-Xmn) or old space (-mx minus -Xmn) must be approximately 850MB or less.

Also refer to Application dependent considerations when using large heap size HP-UX 11i PA-RISC.

Expanding heap size in HP-UX Integrity2

Hotspot 1.4.2.X/5.0.X/6.0.X running in 32-bit mode supports heaps larger than 3GB on HP-UX 11.23 and later. In theory, an mpas 32-bit process on HP-UX 11.23 and later can have 4GB address space. However, the address space available as java heap is smaller than 4GB due to reserved address space for primordial stack, other private segment in JVM like permanent generation, code cache and interpreter, and other reserved address range and alignment.

For Java™ invoked from the command line on HP-UX 11.23, Java™ will automatically choose an appropriate executable.

This is how Hotspot chooses the executable for SDK 1.4.2.09, JDK 5.0.01 and older releases:

  • For heaps less than 1700MB, the executable is 'java'.
  • For heaps greater than or equal to 1700MB the executable is 'java_q4p`.

This is how Hotspot chooses the executable for SDK 1.4.2.10, JDK 5.0.02, JDK 6.0.00 and later releases:

  • For (heaps + max perm + stack limit) less than 1800MB, the executable is 'java'.
  • For (heaps + max perm + stack limit) greater than or equal to 1800MB the executable is 'java_q4p`.

Note: For releases after SDK 1.4.2.10 and JDK 5.0.02, when using CMS, the estimated threshold for switching is about 3% less than the values above, assuming default values are used for NewSize, MaxNewSize, NewRatio, CMSMarkStackSize and CMSRevisitStackSize.

(heaps mean total size of java heap determined by -Xmx option. max perm is perm generation size limit determined by -XX:MaxPermSize=... option. stack limit is primordial stack size limit which is initialized by kernel parameter maxssiz unless it is changed explicitly by rlimit(2), shell's 'ulimit -s', etc.)

You do not need to directly invoke these programs. Just invoke 'java' as usual, and the appropriate program will be run for you.

Recent IPF JVMs have -Xmpas:on and -Xmpas:off options. If -Xmpas:on was used, java_q4p is executed regardless of the java heap size. This might be useful when your java application needs large malloc area or many number of threads.

On PA-RISC system, Q3P and Q4P executables provide larger private data space. On IPF HP-UX, MPAS (Mostly Private Address Space) executable provides larger private address space. A Non-MPAS executable has about 1.9GB private address space and an MPAS executable can have close to 4GB private address space. HP JVM for IPF HP-UX with large data space support is an MPAS executable and named java_q4p.

»Please let us know additional information you'd like to see in the programmer's guide

1, 2 The numbers provided in "Expanding heap size in HP-UX PA-RISC" and "Expanding heap size in HP-UX Integrity" are approximate and might change from release to release.

Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Hewlett-Packard is independent of Sun Microsystems, Inc.

HP JVM Options

posted Jul 21, 2010, 7:46 PM by Kuwon Kang

»Previous topic: Introduction
»Next topic: HP-UX configuration for Java™ support
»Back to table of contents

Unless stated otherwise, the information in this Programmer's Guide applies to HP-UX PA-RISC and HP-UX Itanium® Processor Family systems.

HotSpot technology is the default mode. If you want to run your 1.2 or 1.3 application without the HotSpot technology, you will need to use the -classic option. The -classic mode options are covered in "Classic Technology Tools, Commands, and Environment Variables."

Versions 1.4 and 5.0 are not supported on the Classic VM.

The HP-UX SDK, for the Java™ 2 Platform includes all the standard Java™ tools and provides HP enhancements. The Sun Microsystems' tools pages and HP-Specific features are provided below.

Table of contents
»Java™ tools pages
»Setting the class path
»How classes are found
»Excluding methods from being compiled
»Closing a socket when accept or read is pending (PA-RISC)
»Standard and non-standard options
»FastSwing (version 1.3x only)
»Non-blocking I/O Poll API (SDK 1.3 and later)
»IPv6 support (Internet Protocol version 6) (SDK 1.4 and later)

Java™ tools pages

The links below will take you to Sun Microsystems' documentation for the tools in the Java™ 2 SDK. All of the following tools are supported in our Java™ 2 platform Standard Edition™ for HP-UX.

 »Basic tools (javac, java, javadoc, appletviewer, jar, jdb, javah, javap, extcheck)
 »Remote Method Invocation (RMI) tools (rmic, rmiregistry, rmid, serialver) Internationalization Tools (native2ascii)
 »Security tools (keytool, jarsigner, policytool, kinit, klist, ktab)
 »Java™ IDL and RMI-IIOP Tools (tnameserv, idlj, orbd (1.4 only), servertool)

Setting the class path

The class path is the path that the Java™ runtime environment searches for classes and other resource files.

For information on how to set the class search path (more commonly known by the shorter name, "class path") refer to Sun Microsystems' documentation at:http://java.sun.com/j2se/1.4.2/docs/tooldocs/solaris/classpath.html

How classes are found

For documentation on how the Java™ launcher and javac and javadoc find classes, refer to Sun Microsystems' documentation at: http://java.sun.com/j2se/1.4.2/docs/tooldocs/findingclasses.html

Excluding methods from being compiled

To prevent the HotSpot runtime compiler from compiling certain methods, you can create a file called.hotspot_compiler and add the method to be excluded to the file.

For example, if you want to exclude java/lang/String.indexOf() from being compiled, you would add the following line to the .hotspot_compiler file:

exclude java/lang/String indexOf

By default, the HotSpot VM looks for .hotspot_compiler under the directory where libjvm.sl resides. In addition, it looks for a .hotspot_compiler file in the current directory where the JVM was started.

For example, if you are running the JVM on a PA2.0 server, narrow mode, and the JVM was started from a script called run.sh in the directory /app/myapp/bin, it first looks in the directory{JAVA_HOME}/jre/lib/PA_RISC2.0/server and then it looks for a .hotspot_compiler file in the/app/myapp/bin directory.

Another way to exclude a method is to specify the .hotspot_compiler file using the VM option 
-XX:CompileCommandFile=<list of .hotspot_compiler files separated by ":">

Example:-XX:CompileCommandFile=/tmp/foo/.hotspot_compiler_app_version_71:\ 
/tmp/foo2/hc81

If you specify the -XX:CompileCommandFile option it overrides the default behavior of the VM and the VM will NOT scan either the libjvm.sl directory or the current directory for a .hotspot_compiler file.

Closing a socket when accept or read is pending (PA-RISC)

The java command line option -XdoCloseWithReadPending allows one thread to close() a socket when there is an outstanding read pending on that same socket from another thread. The default behavior when close() is called on a socket which has an outstanding read call from another thread is for the close() to block until the read call completes. With the 
-XdoCloseWithReadPending option, the socket close() call closes the socket and, in the context of the thread with the pending read, a Socket Exception with the message "Socket closed" is thrown.

In some versions of the VM, changes were made to the mechanism by which a socket is closed, and in those versions you no longer need to use the -XdoCloseWithReadPending option to close the socketclose() call.

However HP-UX patches are necessary for some PA-RISC versions. Please refer to the table below that describes whether HP-UX patches are needed for your Java™ version and PA-RISC operating system, and whether the flag needs to be used.

For HP Integrity systems, no flag and no HP-UX patches are required.

Java™ SDK VersionArchitectureHP-UX VersionHP-UX Patches RequiredJVM Flags Required
5.0PA-RISC 2.011.11, 11.23NoneNone
1.4.2.00 & 1.4.2.01PA-RISC11.0 and 11.11None-XdoCloseWithReadPending
11.23NoneNone
1.4.2.02 through 1.4.2.05PA-RISC11.0 
11.11 
11.23
NoneNone
1.4.2.06 & 1.4.2.07PA-RISC11.0NoneNone
11.11PHKL_32457
PHKL_25840
PHKL_25842
PHNE_25644
PHNE_25084
None
11.23NoneNone
1.3.1.00 & 1.3.1.01PA-RISC11.00 
11.11
None-XdoCloseWithReadPending
11.23NoneNone
1.3.1.02PA-RISC11.00PHNE_26728
PHNE_27063
PHNE_27092
PHKL_25842 
PHKL_25995
None
1.3.1.04 through 1.3.1.07PA-RISC11.00 
11.11
None-XdoCloseWithReadPending
11.23NoneNone
1.3.1.08 through 1.3.1.13PA-RISC11.00PHNE_26771None
11.11PHNE_28089None
11.23NoneNone
1.3.1.14 & 1.3.1.15PA-RISC11.00NoneNone
1.3.1.1511.11
11.23

Standard and non-standard options

The HP-UX SDK for the Java™ 2 Platform tools install into either /opt/java1.2/bin/opt/java1.3/bin,/opt/java1.4/bin or /opt/java1.5/bin directory, depending which SDK release you are installing. The installation process will update your system PATH to include the appropriate directory to allow you to invoke Java™ applications.

To see the full list of java options, enter the command:

java -help

To see the full list of non-standard options, enter the command:

java -X

For more information on each option, run the following command:

java <name of option>:help

For example, to see more information on -Xrunhprof, run this command:

java -Xrunhprof:help

The HotSpot technology accepts all of the standard options as well as the following partial list of non-standard -X and -XX options. Non-standard options are not guaranteed to be supported on all VMimplementations, and are subject to change without notice in subsequent releases of the Java™ 2 SDK.

See also Sun Microsystems' "Java™ HotSpot VM Options" athttp://java.sun.com/docs/hotspot/VMOptions.html.

»-classic (version 1.2.2 and 1.3)
»-d64 (SDK version 1.4 and later)
»-Dhpux.font and -Dhpux.font.dpi (version 1.2.2 and later)
»-Dhpux.im.disable (version 1.2.2 and later)
»-Dhpux.im.enable.awt (version 1.2.2 and later)
»-Dhp.swing.use FastSwing (version 1.3.1 only)
»-pa11 (version 1.2.2 and later) PA-RISC only
»-verbosegc
»-Xbatch
»-Xbootclasspath
»-XdoCloseWithReadPending
»-Xeprof (version 1.2.2.05 and later)
»-XheapInitialSizes
»-Xincgc
»-Xint
»-Xmn<size>
»-Xms<size>
»-Xmx<size>
»-Xnocatch
»-Xnoclassgc
»-Xoptgc
»-Xprep
»-Xprof
»-Xrs (version 1.3.1 and later)
»-Xrunhprof
»-Xshare:on, -Xshare:off, -Xshared:auto
»-Xss<size>
»-Xusealtsigs
»-Xverbosegc<options>
»-XX:+AggressiveHeap
»-XX:+AllowUserSignalHandlers
»-XX:CompileCommandFile
»-XX:+DisableExplicitGC
»-XX:+ForceMmapReserved
»-XX:+HeapDump
»-XX:+HeapDumpOnly
»-XX:+HeapDumpOnCtrlBreak
»-XX:+HeapDumpOnOutOfMemoryError
»Interaction of HeapDump options
»-XX:MainThreadStackSize=<value>
»-XX:MaxDirectMemorySize=<size>
»-XX:MaxNewSize=<size>
»-XX:MaxPermSize<size>
»-XX:PermSize<size>
»-XX:NewSize=<size>
»-XX:NewSizeThreadIncrease=<size>
»-XX:-NoHandoff (version 1.3 and later)
»-XX:PrefetchCopyIntervalInBytes=n
»-XX:PrefetchScanIntervalInBytes=n
»-XX:+PrefetchScavengeFieldsAhead=n
»-XX:+PrefetchMarkSweep
»-XX:SchedulerPriorityRange=SCHED*,base,top
»-XX:+ServerApp
»-XX:SurvivorRatio=<size>
»-XX:+UseCompilerSafepoints (version 1.3.1 PA-RISC and later)
»-XX:+UseOnStackReplacement (version 1.3.1 PA-RISC and later)
»-XX:+UseParallelGC (version 1.4 and later)
»-XX:+UseSIGUSR2
»-XX:+UseGetTimeOfDay

-classic (version 1.2.2 and 1.3) 
Use the -classic command line option to run your application without the HotSpot technology. If you use the -classic option, it must be the first option in the java command line. The -classic mode options are covered in "Classic Technology Tools, Commands, and Environment Variables." The Classic VM is not supported on SDK 1.4 and JDK 5.0.

-d64 (SDK version 1.4 and later) 
Runs Java™ in 64-bit mode. In HP SDK 1.4, interpreter and compiler -Xint, -Xmixed, and -Xcompmodes are supported.

-Dhpux.font and -Dhpux.font.dpi (version 1.2.2 and later) 
To change the system font size of your Java™ application when it is launched, invoke Java™ with the command line option -Dhpux.font.dpi=[75|100] or -Dhpux.font=small_fonts.

-Dhpux.im.disable (version 1.2.2 and later) 
This Java™ command line option is a workaround to allow Menu Mnemonics to work. It turns off the input method.

-Dhpux.im.enable.awt (version 1.2.2 and later) 
This option is a workaround to allow the java input method to work with European locale composed characters. Using this option will disable the java input method for the Asian locale.

-Dhp.swing.use FastSwing(version 1.3.1 only) 
This option improves the performance of swing APIs for Java™ for HP-UX version 1.3.1.

-pa11 (version 1.2.2 and later on PA-RISC only) 
Note: If you run HotSpot with the -pa11 flag or run on a PA 1.1 system, your heap address space will be restricted to 1G.

PA1.1 binaries can be run on PA1.1 as well as PA2.0 systems; however, The PA2.0 shared libraries are the default if you are running on a PA2.0 system. You can override the use of the PA2.0 shared libraries on a PA2.0 system by specifying the -pa11 flag.

On a PA2.0 based system, if you invoke Java™ as follows, the default PA2.0 shared libraries are used.

java -version

If you invoke Java™ with the -pa11 option as follows, the PA1.1 shared libraries are used.

java -pa11 -version

-verbosegc 
Prints out the result of a garbage collection to the stdout stream. At every garbage collection, the following 5 fields are printed:

[%T %B->%A(%C), %D]

%T is "GC:" when the garbage collection is a scavenge, and "Full GC:" when its a full garbage collection. A scavenge collects live objects from the New Generation only, whereas a full garbage collection collects objects from all spaces in the Java™ heap.

%B is the size of Java™ heap used before garbage collection, in KB.

%A is the size after garbage collection, in KB.

%C is the current capacity of the entire Java™ heap, in KB.

%D is the duration of the collection in seconds.

-Xbatch 
(Excerpt below from Sun Microsystems' documentation.)

Disable background compilation. Normally, if compilation of a method is taking a long time, the VM will compile the method as a background task, running the method in interpreter mode until the background compilation is finished. The -Xbatch flag disables background compilation so that compilation of all methods proceeds as a foreground task until completed, regardless of how long the compilation takes. This flag is provided for users who desire more deterministic behavior of method compilation for purposes such as benchmarking.

-Xbootclasspath

Specify a semicolon-separated list of directories, JAR archives, and ZIP archives to search for boot class files. These will be used in place of the default boot class files in the jre/lib/rt.jar andjre/lib/i18n.jar archives normally used by the Java™ 2 software.

-XdoCloseWithReadPending 
The java command line option -XdoCloseWithReadPending allows one thread to close() a socket when there is an outstanding read pending on that same socket from another thread.

For more informaton on when and how to use this option, refer to "Closing a socket when accept or read is pending" in this chapter.

-Xeprof 
The -Xeprof option generates profile data for HPjmeter. The -Xeprof option controls profiling of Java™ applications running on JRE for HP-UX for the Java™ 2 Platform and collects method clock and CPU times, method call count, and call graph. (For more information on HPjmeter, see HPjmeter Downloads and Documentation.)

Note: Zero preparation profiling is a beta feature of the HP JDK/JRE 5.0.03. It is started from the command line by sending a signal to the JVM to start eprof. Engaging zero preparation profiling may have a short term impact on application performance as the JVM adjusts to the demands of performing dynamic measurements.

To profile your application use the following command:

java -Xeprof:<options> ApplicationClassName

To profile your applet, use:

appletviewer -J-Xeprof:<options> URL

where <options> is a list of <key>[=<value>] arguments separated by commas.

We have found the following options useful in most cases:

For CPU time metrics with minimal intrusion:

-Xeprof

Exact call count information and object creation profiling:

-Xeprof:inlining=disable

To see the complete list of available options, use

java -Xeprof:help

After the profiled applet or application terminates execution, the Java™ Virtual Machine writes the profile data to a file in the current directory. Use HPjmeter to analyze the file.

For more information on -Xeprof and supported -Xeprof options see -Xeprof Options

»Supported -Xeprof options

-XheapInitialSizes 
Use the -XheapInitialSizes option to see the default value for the Java™ Heap.

-Xincgc 
(excerpt from http://java.sun.com/products/hotspot/2.0/README.html)

Enables the incremental garbage collector. The incremental garbage collector, which is off by default, will eliminate occasional garbage-collection pauses during program execution. However, it can lead to a roughly 10% decrease in overall performance. -Xincgc 32-bit PA support is not available on SDK 1.4.0.x. 32-bit PA support for -Xincgc is available beginning with SDK 1.4.1.00.

-Xint 
The HP-UX HotSpot compiler automatically and efficiently converts bytecode to native machine instructions at runtime. Only use the java -Xint option to disable the HotSpot compiler, if compiled code is not executing correctly and you have verified the problem with the HP Response Center. Disabled, the Java™ Virtual Machine interprets all Java™ methods.

-Xmn<size> 
Sets the Java™ new generation heap size. The "new generation" is the first generation in HotSpot's generational garbage collector. (This option replaces the option -XX:NewSize=N.)

Example:-Xmn64m
Default:In 1.3.1, the new size is 1/3 of ms, that is the initial or "starting" heap size.

-Xms<size> 
(excerpt from http://java.sun.com/products/hotspot/1.0/README.html)

Specifies the initial size, in bytes, of the memory allocation pool. This value must be a multiple of 1024 greater than 1MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes. Do not use this option in conjunction with the -XX:+AggressiveHeap option. Doing so will cause the options to override each other's settings for heap size.

Examples:-Xms6291456
-Xms6144k
-Xms6m
Default:-Xms2m

-Xmx<size> 
(excerpt from http://java.sun.com/products/hotspot/1.0/README.html)

Specifies the maximum size, in bytes, of the memory allocation pool. This value must a multiple of 1024 greater than 2MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes. Do not use this option in conjunction with the -XX:+AggressiveHeap option. Doing so will cause the options to override each other's settings for heap size.

Examples:-Xmx83886080
-Xmx81920k
-Xmx80m
Default:-Xmx64m

-Xnocatch 
The -Xnocatch option disables the Java™ "catch-all" signal handler. Use this option to generate clean stack traces from native code.

-Xnoclassgc 
Disables class garbage collection.

-Xoptgc 
The optimistic garbage collection flag. Improves garbage collection performance of applications with mostly short-lived objects. A server-side application that creates many short-lived objects for each transaction is likely to benefit greatly with Xoptgc. However this flag should be used with caution. It is not recommended for applications that build up objects quickly during the run time that are not short-lived.

-Xprep 
The -Xprep option is used to dynamically preprocess (modify) bytecodes of the classes loaded by the VM. Its syntax is:

-Xprep <factory_class_name>:<arguments>

where <factory_class_name> is a qualified name of the class that will be used to create the preprocessor, and <arguments> is any string that will be passed to the method creating the preprocessor. The location of the factory class must be specified in the -Xbootclasspath option passed to the VM, together with the location of the appropriate rt.jar.

When the -Xprep option is specified, before loading the application classes, the Java™ VM will load the specified factory class and execute the method in the class declared as:

class <factory_class_name> 
implements 
Preprocessor {public static Preprocessor createPreprocessor (String arg)

where Preprocessor is an interface defined as:

package hp.javatools.bytecode; 
public interface Preprocessor {public abstract byte[] instrument\ 
(String name, byte[] klass);}

The VM will pass the <arguments> specified in the -Xprep option to the createPreprocessor method as its only argument. The Preprocessor object returned by the invocation will be saved by the VM.

For each subsequently loaded class, the VM will invoke the instrument() method on the Preprocessor object, passing the name of the class being loaded, and the bytecode representation of the class. The returned array of bytes will be used by the VM as the replacement of the original version of the class. If null is returned, the original version of the class will be used.

-Xprof 
(excerpt below from http://java.sun.com/products/hotspot/1.0/README.html)

Profiles the running program, and sends profiling data to standard output. This option is provided as a utility that is useful in program development and is not intended to be be used in production systems.

-Xrs (version 1.3.1 and later) 
Reduces use of operating-system signals by the Java™ virtual machine (JVM), allowing for orderly shutdown of a Java™ application. Using the -Xrs option removes the signal handlers (for SIGHUP, SIGINT and SIGTERM) that run the shutdown hooks that are used to shut the application down in an orderly fashion. If you use -Xrs, the shutdown hooks won't be run if the application terminates as a result of receiving a SIGHUP, SIGINT or SIGTERM signal, unless the your code explicitly catches these and runs the shutdown hooks itself.

-Xshare:on, -Xshare:off, -Xshared:auto 
UserSharedSpaces is not supported on HP-UX

-Xrunhprof 
Enables cpu, heap, or monitor profiling. This option is typically followed by a list of comma-separated "<suboption>=<value>" pairs. Run the command java -Xrunhprof:help to obtain a list of suboptions and their default values.

-Xss<size> 
(excerpt below from http://java.sun.com/j2se/1.3/docs/tooldocs/solaris/java.html#options)

Set the maximum native stack size, in bytes, for any thread. Each Java™ thread has two stacks: one for Java™ code and one for C code. This option sets the maximum stack size that can be used by C code in a thread. Every thread spawned during the execution of the program passed to java will have the number you specify in this option as its C stack size. This flag is appropriate for programs that have small thread stack size requirements and/or create several thousand threads, with the potential for running out of virtual memory. Append the letter k or K to indicate kbytes, and m or M for megabytes.<size> must be > 1000 bytes.

Defaults:-Xss512k (Java™ 1.3, 1.4, and 5.0 32-bit mode)
  -Xss1m (Java™ 1.4 and 5.0 64-bit mode)

Note: The default stack size for 1.4 and 5.0 64-bit mode JVM - created threads is 1MB. On PA-RISC 32 and 64-bit systems, the default stack size is 64 KB. Therefore, if you are using C language main programs that attach with JNI, you will want to adjust the stack size to avoid overflows.

-Xusealtsigs (replaces -XX:+UseSIGUSR2 beginning with SDK 1.3.1.13, 1.4.1 and 1.4.2) 
Instructs the JVM to avoid using SIGUSR1 and SIGUSR2 for internal operations (like Thread.interrupt()calls). In SDK 1.4.1 and later, by default the JVM uses both SIGUSR1 and SIGUSR2. In SDK 1.3.1.13 onlySIGUSR1 is used. If -Xusealtsigs is used, then two signals halfway between SIGRTMIN and SIGRTMAX will be chosen instead.

-Xverbosegc<options> 
The -Xverbosegc option prints out detailed information about the spaces within the Java™ Heap before and after garbage collection.

Beginning with 1.3.1.14 and 1.4.2.05, the process id will be automatically appended to the verbosegcfilename you specify. This helps you to associate a verbosegc output with the corresponding Java™ process, especially in cases where an application executes several Java™ processes.

The syntax of the option is:

-Xverbosegc[:help]|[0|1][:file=[stdout|stderr|<filename>]]

 »For Java™ 1.2 and 1.3 -Xverbosegc:help detailed explanation
 »For Java™ 1.4.0, -Xverbosegc:help detailed explanation
 »For Java™ 1.4.1 and 1.4.2, -Xverbosegc:help detailed explanation
 »For Java™ 5.0, -Xverbosegc:help detailed explanation

In addition we recommend HP's garbage collection analysis tool HPjtune, which displays information contained in an Xverbosegc log graphically. HPjtune is available at no cost from Java™ Technology Software on HP-UX

For documentation on the new garbage collectors, refer to "Tuning Garbage Collection with the 1.4.2 Java™ Virtual Machine" at http://java.sun.com/docs/hotspot/gc1.4.2/index.html

-XX:+AggressiveHeap 
(excerpt from http://java.sun.com/docs/hotspot/ism.html)

This option instructs the JVM to push memory use to the limit. It sets the overall heap to around 3850MB, the memory management policy defers collection as long as possible, and (beginning with J2SE 1.3.1.05) some GC activity is done in parallel. Because this option sets heap size, do not use the-Xms or -Xmx options in conjunction with -XX:+AggressiveHeap. Doing so will cause the options to override each other's settings for heap size.

Because the -XX:+AggressiveHeap option has specific system requirements for correct operation and may require privileged access to system configuration parameters, it should be used with caution. We have found it to be useful for certain applications that create a lot of short lived objects.

-XX:+AllowUserSignalHandlers 
Instructs the HotSpot JVM not to complain if the native code libraries install signal handlers. This only matters if the handlers were installed when the VM is booting.

-XX:CompileCommandFile=<list of .hotspot_compiler files separated by ":"> (version 1.3.1.10, 1.4.1.06, 1.4.2.00 and later) 
Specifies one or more .hotspot_compiler files that you do not want to be compiled by the JVM. Specifying this option overrides the default behavior of the JVM which is to scan the libjvm.sl directory or the current directory for a .hotspot_compiler file.

-XX:+DisableExplicitGC 
Disable calls to System.gc(), JVM still performs garbage collection when necessary.

-XX:+ForceMmapReserved 
Use this option to reserve the space for all large memory regions used by the JVM. This includes the Java™ Heap, which is an mmap'ed space. Starting with HP-UX 11.11, the default behavior is that the memory regions be reserved lazily. Most large server-side applications will use all of the space, so improved performance can be obtained by reserving the space at program initialization.

-XX:+HeapDump 
The -XX:+HeapDump option can be used to observe memory allocation in a running Java™ application by taking snapshots of the heap over time. Using the _JAVA_HEAPDUMP=1 environment variable allows memory snapshots to be taken without modifying the java command line.

The HeapDump functionality is available starting with SDK 1.4.2.10 and JDK 1.5.0.03.

To enable this functionality, use the command-line option -XX:+HeapDump or set the environment variable _JAVA_HEAPDUMP=1 before starting the Java™ application. (e.g. export _JAVA_HEAPDUMP=1)

This output is similar to -Xrunhprof:heap=dump. The difference is that the thread info (THREAD START) and trace info (TRACE) will not printed to the output file.

With the option enabled, each time you send the process a SIGQUIT signal the JVM produces a dump of the Java™ heap (a Java™ heap snapshot), in hprof format. The name of the file will have the following format:

java_<pid>_<date>_<time>_heapDump.hprof.txt

By creating a series of these snapshots you can see how the number and size of objects varies over time.

Note: a full GC is executed prior to the Heap snapshot.

Please also note the section on ‘Interaction of HeapDump Options’.

-XX:+HeapDumpOnly and _JAVA_HEAPDUMP_ONLY 
Beginning with SDK 1.4.2.11, the option -XX:+HeapDumpOnly can be used to enable HP Heap Dumps using SIGVTALARM signal (signal 20). To enable this feature without altering the java command line, the environment variable _JAVA_HEAPDUMP_ONLY can be set in the user's environment prior to executing the java application java.

If _JAVA_HEAPDUMP_ONLY is set, or -XX:+HeapDumpOnly command line option is used, then the HP HeapDump functionality will be triggered by sending signal SIGVTALRM (20) to the process, and the printing of thread and trace information to stdout is suppressed.

The HeapDump is written to a file with the following filename format:

java_<pid>_<date>_<time>_heapDump.hprof.txt

The default output format is ASCII. The output format can be changed to hprof binary format by setting the _JAVA_BINARY_HEAPDUMP environment variable, which is used to specify that heap dumps be emitted in binary format only. By default the -XX:+HeapDump and -XX:+HeapDumpOnly options will emit heap dump information in ascii format.

Please note the section on ‘Interaction of HeapDump Options’.

-XX:+HeapDumpOnCtrlBreak 
The command line option -XX:+HeapDumpOnCtrlBreak enables the ability to take snapshots of the java heap when a SIGQUIT is sent to the java process, without using the jvmti-based -Xrunhprof:heap=dump.

This feature is similar to the -XX:+HeapDump option, except the output format is binary hprof format, and is placed into a filename with the following naming convention:

The HeapDump is written to a file with the following filename format:

java_<pid>_heapDump.hprof.<millitime>

The option -XX:+HeapDumpOnCtrlBreak is available starting with SDK 1.4.2.11 and JDK 5.0.05.

Please note the section on ‘Interaction of HeapDump Options’.

-XX:+HeapDumpOnOutOfMemoryError 
The HeapDumpOnOutOfMemoryError command line option causes the JVM to dump a snapshot of the Java™ heap when an Out Of Memory error condition has been reached. The heap dump format generated by HeapDumpOnOutOfMemoryError is in hprof binary format, and is written to filenamejava_pid<pid>.hprof in the current working directory. 

The option -XX:HeapDumpPath=<file> can be used to specify the dump filename or the directory where the dump file is created. Running with application with -XX:+HeapDumpOnOutOfMemoryError does not impact performance. Please note the following known issue: The HeapDumpOnOutOfMemoryError option does not work with the low-pause collector (-XX:+UseConcMarkSweepGC). This option is available starting with the SDK 1.4.2.11 and JDK 5.0.04 releases.

Interaction of HeapDump options 
If the HP environment variable _JAVA_HEAPDUMP is set and the option -XX:+HeapDumpOnCtrl is specified, then both the HP ascii and JS hprof binary formats will be emitted when signal -3 is sent to the process:

java_pid27298.hprof.1152743593943
java_27298_060712_153313_heapDump.hprof.txt

If only the binary format heap dump is desired (on SIGQUIT), then set only _JAVA_BINARY_HEAPDUMP environment variable, or use -XX:+HeapDumpOnCtrl (without setting _JAVA_HEAPDUMP environment variable).If _JAVA_BINARY_HEAPDUMP is set and -Xrunhprof:heap=dump is set, then both the runhprof ascii-based and HP binary files are produced.If _JAVA_HEAPDUMP_ONLY is set then the Heap Dumps are triggered via the SIGVTALRM signal (20), instead of SIGQUIT (3); then only the heapdump is produced (thread and trace dump to stdout of app is supressed);_JAVA_HEAPDUMP_ONLY and _JAVA_BINARY_HEAPDUMP will produce a binary format heap dump when SIGVTALRM is sent to the process. No thread trace is generated to stdout.

-XX:MainThreadStackSize=<value> 
Specifies the main/primordial thread stack size. The main/primordial thread is the first thread when a process is created. It is the thread that has the main method. Other Java™ threads are controlled by -Xss<size>. See also "Using JNI - main/primordial thread stack size limits."

-XX:MaxDirectMemorySize=<size> 
Specifies the maximum amount of memory in bytes that the Java™ NIO library can allocate for direct memory buffers. The default is 64 megabytes, which corresponds to 
-XX:MaxDirectMemorySize=64m. The use of direct memory buffers can minimize the copying cost when doing I/O operations.

-XX:MaxNewSize=<size> 
Sets the maximum size of new generation (in bytes). The arguments can now be followed by either 'k' or 'm' to specify KB or MB. For 1.2, specify KB only. Not supported in 1.3.

-XX:MaxPermSize=<size> 
Sets the maximum size of permanent generation (in bytes). Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes. For example, -XX:MaxPermSize=32m specifies a value of 32MBytes for MaxPermSize (Note that in SDK 1.2.2, this option took an integer that specified a value in kbytes only).

Default: 64MB

-XX:PermSize=<size> 
Specifies the initial size, in bytes, of the Permanent Space memory allocation pool. This value must be a multiple of 1024 greater than 1MB. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes.

Examples:-XX:PermSize=6291456 
-XX:PermSize=6144k 
-XX:PermSize=6m
Default:-XX:PermSize=16m (1.4 and later)

-XX:NewSize=<size> 
Sets the default size, in bytes, of new generation. Append the letter k or K to indicate kilobytes, or m or M to indicate megabytes. (For 1.2, specify KB only). Not supported in 1.3.

-XX:NewSizeThreadIncrease=<size> 
Sets the additional size, in bytes, added to desired new generation size per non-daemon thread. (Note that in SDK 1.2.2, this option took an integer that specified a value in kbytes only).

-XX:-NoHandoff (version 1.3 and later) 
XX:-NoHandoff works in conjunction with the new functionality added in HP-UX 11.11 HP-UX patch PHCO_25226. In HP-UX 11.22 and later, this option has less impact than for HP-UX 11.11.

-XX:PrefetchCopyIntervalInBytes=n 
Prefetch n bytes ahead for scavenge copy destination area. The default is 0. The ideal value may vary depending on the processor.

-XX:PrefetchScanIntervalInBytes=n 
Prefetch n bytes ahead for scavenge scan area. The default is 0. The ideal value may vary depending on the processor.

-XX:+PrefetchScavengeFieldsAhead=n 
Prefetch the next n fields of objects during scavenge. The default is 0. The ideal value may vary depending on the processor.

-XX:+PrefetchMarkSweep 
Use prefetching during full garbage collections. The default is true.

-XX:SchedulerPriorityRange=SCHED 
This option can be used to both select the scheduling policy and map the Java™ thread priorities, 1 (low) through 10 (high), to the underlying HP-UX thread priorities:

    SCHED* is one of the scheduling policies:

      SCHED_FIFO

      SCHED_RR

      SCHED_RR2

      SCHED_RTPRIO

      SCHED_OTHER

      SCHED_HPUX

      SCHED_NOAGE (a subset of the range of priorities supported by CHED_TIMESHARE)

      SCHED_TIMESHARE is not supported as an option, but is the same  

                      as SCHED_HPUX and use of any such non-supported  
                      
                      SCHED* will result in using the default policy, 
                      
                      SCHED_HPUX.

    base - the HP-UX minimum value for the set of priorities to be mapped

    top - the HP-UX maximum value for the set of priorities to be mapped

  The format can be:

    -XX:SchedulerPriorityRange<scheduler>,<base>,<top>

                         or    <scheduler>

                         or    <scheduler>,<base>

                         or    <scheduler>,,<top>

Normally, the underlying system interfaces allow you to obtain the max and 

min priorities:

  sched_get_priority_max()

  sched_get_priority_min()

However, in Java, the range of priorities must be passed to the JVM at start-up.  

Therefore, we need to know the values prior to starting the application.

For each of the scheduling policies:

  POSIX PRIORITIES:  SCHED_FIFO, SCHED_RR, SCHED_RR2

  The POSIX priorities range from 0 to 31 where 31 is the highest and 0 is the 
  
  lowest. The number of priority levels can be changed with the kernel parameter 

  'rtsched_numpri' 

  (possible values are 32-512).

  REALTIME PRIORITIES: SCHED_RTPRIO

  The range is -128 to -1.

  TIMESHARE PRIORITIES: SCHED_HPUX, SCHED_NOAGE

  For SCHED_HPUX, the default range is -256 (base) to -129 (top).

    -129 to -154 are system priorities available only to 'root' users

    -179 to -256 are user timeshare priorities

  For SCHED_NOAGE, the range is -256 (base) to -154 (top).  For HP-UX 11.11 and 

  later releases, the highest value is -179 (top).

An example of the use of the option on HP-UX 11.11 and later HP-UX releases is:

  -XX:SchedulerPriorityRange=SCHED_NOAGE,-199,-179

The SCHED_NOAGE scheduling policy will be used - a thread's priority will remain 

fixed.   

The Java™ thread priorities will map as:

  1: 
	

-199

  2: 
	

-197

  3: 
	

-195

  4: 
	

-193

  5: 
	

-191

  6: 
	

-189

  7: 
	

-187

  6: 
	

-185

  8: 
	

-183

  9: 
	

-181

 10: 
	

-179

-XX:+ServerApp 
A set of XX options which may make long running server applications run faster. The XX options and their values, such as the tunable option for thread local sizes, are modified over time based on the application results we observe. For each release, the options as well as the values may be different depending upon the default values of XX options. We recommend that you test to see whether this set enhances the performance of your application before you use the option in production.

-XX:SurvivorRatio=<size> 
Ratio of eden/survivor space size. The default is 8, meaning that eden is 8 times bigger than from and to, each. Here are some examples:

Xmn / (SurvivorRatio + 2) = size of from and to, each 
( Xmn / (SurvivorRatio + 2) ) * SurvivorRatio = eden size

If your new generation heap is 100MB, the space reserved for objects to survive a garbage collection is1/2*(100MB/8), or 6.25MB. Raising this value may improve overall application performance when the New space is large and/or when your application keeps a very low percentage of objects.

-XX:+UseCompilerSafepoints (PA-RISC 1.3.1, 1.4 and later, Itanium 1.4.2 and later) 
Enables compiler safe points. Enabling compiler safepoints guarantees a more deterministic delay to stop all running java threads before doing a safepoint operation, namely garbage collection and deoptimization. In HP SDK 1.3.1 and 1.4 releases, compiler safe points is off by default, and you may also require a HP-UX patch. Refer to the release notes for your SDK for more information.

-XX:+UseOnStackReplacement (PA-RISC 1.3.1, 1.4 and later, Itanium 1.4.2 and later) 
Enables on stack replacement. On stack replacement enables the interpreter to go into compiled code while it is executing the same instance of the method call. For example, if the VM is executing a method that has a loop with a large number of iterations, an intra-method hotspot will occur. To get better performance, the method should run in compiled mode instead of interpreted mode. If you enable on stack replacement, you should also enable compiler safe points (see the previous option). In HP SDK 1.3.1 and 1.4 releases, on stack replacement is off by default, and you may also require a HP-UX patch. Refer to the release notes for your SDK for more information.

-XX:+UseParallelGC (version 1.4 and later) 
Use parallel garbage collection. The parallel collector has been enhanced in 5.0 to monitor and adapt to the memory needs of the application in order to eliminate the need to tune command-line options to achieve the best performance. For a synopsis of garbage collection features, refer tohttp://java.sun.com/j2se/1.5.0/docs/guide/vm/gc-ergonomics.html

-XX:+UseSIGUSR2 (for SDKs 1.4.0.x and 1.3.1.00 through 1.3.1.12) 
Replaced by the -Xusealtsigs option.

Instructs the JVM to use SIGUSR2 for internal operations like Thread.interrupt() calls instead ofSIGUSR1, the default. This allows you to better implement third party middleware applications that in some versions want to use SIGUSR1 for similar purposes in their native code.

-XX:+UseGetTimeOfDay 
Instructs the JVM to use the GetTimeOfDay call instead of the mechanism used in earlier versions whereby the number of cpu ticks since the application started is used to calculate the current time. With this new mechanism, changes to the system date or time using date(1)adjtime(2), or time synchronization utilities such as ntp are not reflected in the date and time that Java™ returns, until the process is restarted. If your application requires that Java™ immediately reflects such system time changes, you can use the -XX:+UseGetTimeOfDay option, however you may notice a drop in performance.

Features

FastSwing (version 1.3x only) 
FastSwing is an HP feature which provides significant performance improvement for Swing Applications on a Remote X-Server. In HP's SDK 1.4.x and later, the FastSwing option is ignored because the Java™ 1.4 performance enhancements provide out-of-the-box performance for both local and remote displays, equivalent to FastSwing.

To use this feature invoke java or appletviewer as follows:

/opt/java1.3/bin/java -Dhp.swing.useFastSwing=true MyApp 
or 
/opt/java1.3/bin/appletviewer -J-Dhp.swing.useFastSwing=true applet.html

Currently we recommend using this feature only for Remote displays as it has the following caveat:

Double-buffered Swing Components cannot perform Graphics2D operations with the FastSwing feature turned on. When doing so you will get the following exception:

java.lang.ClassCastException: sun.awt.motif.X11OffScreenImage at 
BezierAnimationPanel.run(BezierAnimationPanel.java:223) at 
java.lang.Thread.run(Unknown Source)

Non-blocking I/O Poll API (SDK 1.3 and 1.4, deprecated in 5.0) 
com.hp.io.Poll supports a general mechanism for reporting I/O conditions associated with a set of FileDescriptors and for waiting until one or more conditions becomes true. Specified conditions include the ability to read or write data without blocking, and error conditions. Use of com.hp.io.Polldramatically reduces the number of threads required to support large numbers of clients in large server-side Java™ applications.

IPv6 support (Internet Protocol version 6) - SDK 1.4.2.x and later

IPv6 is a set of Internet Protocol specifications designed to provide enhancements over the capabilities of the existing IPv4 service in terms of scalability, security, mobility, ease-of-configuration, and real-time traffic handling.

For more information about IPv6, see Sun Microsystems' Networking IPv6 User Guide for J2SDK/JRE 1.4 at http://java.sun.com/j2se/1.4/docs/guide/net/ipv6_guide/

HP-UX 11i v1 (11.11), 11i v2 (11.23), and 11i v3 (11.31) support dual protocol stacks IPv4 and IPv6. IPv6 is not currently supported on HP-UX 11.0 or 11.22 (11i v1.5). To support IPv6, HP-UX 11i v1 (11.11) requires HP-UX patches; HP-UX 11i v2 (11.23) and 11i v3 (11.31) do not. See the following table for IPv6 support information.

IPv6 Support in HP-UX Java Releases

HP-UX Java PlatformInstall HP-UX 11.11 Patches?Install HP-UX 11.23 Patches?Install HP-UX 11.31 Patches?Default Protocol StackUse of Properties File?
SDK 1.4.2.xYesNoNoIPv4No
JDK 5.0–5.0.14YesNoNoIPv4No
JDK 5.0.15 and laterYesNoNoIPv4Yes
JDK 6.0–6.0.02YesNoNoIPv4No
JDK 6.0.03 and laterYesNoNoIPv6Yes

For the availability of HP-UX patches required for IPv6 support on HP-UX 11.11, please see

»Patch Information
»TOUR Transition Patches for HP-UX 11i

Setting IPv4 or IPv6 Support

The default protocol stack (IPv4 or IPv6) is set through the java.net.preferIPv4Stack system property. However, if you want to change the system property, you can set IPv4 or IPv6 support on the Java command line as follows:

IPv4 support:

-Djava.net.preferIPv4Stack="true"

IPv6 support:

-Djava.net.preferIPv4Stack="false"

 

Beginning with the JDK 5.0.15 and JDK 6.0.03 releases, you can use the properties file: 

  {JAVA_HOME}/jre/lib/net.properties 

to set IPv4 support: 

  java.net.preferIPv4Stack=true 

or to set IPv6 support: 

  java.net.preferIPv4Stack=false

 

Thread Local Heap

posted Jul 21, 2010, 7:35 PM by Kuwon Kang   [ updated Jul 21, 2010, 7:36 PM ]

Thread local heap

The Garbage Collector (GC) maintains areas of the heap for fast object allocation.

The heap is subject to concurrent access by all the threads that are running in the JVM. Therefore, it must be protected by a resource lock so that one thread can complete updates to the heap before another thread is allowed in. Access to the heap is therefore single-threaded. However, the GC also maintains areas of the heap as thread caches or thread local heap (TLH). These TLHs are areas of the heap that are allocated as a single large object, marked non-collectable, and allocated to a thread. The thread can now sub allocate from the TLH objects that are below a defined size. No heap lock is needed which means that allocation is very fast and efficient. When a cache becomes full, a thread returns the TLH to the main heap and grabs another chunk for a new cache.

A TLH is not subject to a garbage collection cycle; it is a reference that is dedicated to a thread.

Threading

posted Jul 21, 2010, 7:06 PM by Kuwon Kang

Introduction

This document will give you an overview between the relationship of the threading models used by the SolarisTM operating environment and the JavaTM thread model. Choices you make about Solaris threading models can have a large impact on the performance of your Java runtime enivonment on Solaris operating environment.

The Java programming language is naturally multi-threaded and because of this the underlying OS implementation can make a substantial difference in the performance of your application. Fortunately (or unfortunately), you can choose from multiple threading models and different methods of synchronization within the model, but this varies from VM to VM. Adding to the confusion, the threads library will be transitioning from Solaris 8 to 9, eliminating many of these choices.

Version 1.1 is based on green threads and won't be covered here. Green threads are simulated threads within the VM and were used prior to going to a native OS threading model in 1.2 and beyond. Green threads may have had an advantage on Linux at one point (since you don't have to spawn a process for each native thread), but VM technology has advanced significantly since version 1.1 and any benefit green threads had in the past is erased by the performance increases over the years.


Solaris Threading Models

Two different threading models are available in Solaris, a many-to-many model and a one-to-one model. Many-to-many and one-to-one refer to (essentially) LWPs (lightweight processes) and Solaris Threads. Each LWP has a kernel thread, but once on an LWP the kernel will schedule you on a cpu. If you are in a thread, the thread library must schedule you on an LWP before you make it to a cpu. Why would this be an advantage? Because there is a lot more state and kernel resources used if there are many LWPs, so fewer LWPs keeps the kernel light and nimble and improves performance. Why would this be a disadvantage? It's possible that you can get thread starvation if the thread doesn't get scheduled on an LWP in an adequate amount of time.

Looking at the following figure:

Java threads are really Solaris Threads since we've been using the native OS threading model in the 1.2 VM. The left side shows the many-to-many model, where Solaris threads are scheduled by the Solaris libthread.so library to run on LWPs. The LWPs are one-to-one with kernel threads. The right hand side shows a one-to-one model, which marries Solaris threads with LWPs. This creates more LWPs (since one must exist for each thread), and we'll explore the effects later.

The many-to-many model is the default in pre-Solaris 9. Solaris 8 offers an "alternate" threading library for a true one-to-one model, but before Solaris 7 you can only use the many-to-many model (or fake it with bound threads). Just to throw in a little more confusion, when creating Solaris threads, one can specify that the thread be bound to an LWP for its entire life. This effectively creates a one-to-one model, but has some overhead which we'll also explore later.


Synchronization

In the many-to-many model, HotSpot allows 2 types of synchronization, LWP based and thread based. If you like to follow along with the Solaris Documentation this is the equivalent of USYNC_PROCESS (lwp based) and USYNC_THREAD (thread based), look in /usr/include/sys/synch.h in your local Solaris installation. LWP based synchronization is considered heavier weight, since it must work between processes, whereas thread based synchronization is private to a process. Obviously, thread based synchronization is all that's required for an application and multiple java virtual machines currently don't have the ability to share global data between them, but we allow both types of synchronization in J2SE 1.3 and beyond, and you'll see why in a moment.

Combinatorial review

So, let's review our options, remember that not all of these are relevant to all J2SE releases or even versions of the Solaris OS.

Featurepre-Solaris 8Solaris 8Solaris 9
Many-to-Many, thread based synchronization1.3*,1.41.3*,1.4Not Available
Many-to-Many, lwp based synchronization1.2*,1.3,1.4*1.2*,1.3,1.4*Not Available
One-to-One, via Bound threads1.3,1.41.3,1.4Not Available
One-to-One, via Alternate Threads libraryNot Available1.2,1.3,1.41.2*,1.3*,1.4*

*:Note: The default model for this VM

While reading this table, realize that certain Solaris OS versions will not allow something that is capable in the VM. For example, even though the VMs can all use the one-to-one model via the alternate threads library, its not available on Solaris 7. Also, you can figure out that the alternate thread library in Solaris 8 will become the only thread library in Solaris 9, which means the many-to-many model will be officially retired. This means that the many-to-many model, available in J2SE version 1.2, 1.3, and 1.4, cannot be utilized with Solaris 9.

What may seem somewhat disturbing in Solaris 9, the deletion of threading model options, has actually simplified things quite a bit, and the performance thus far has been excellent, with only the most severely tuned code degrading, and most improving by a good margin. Give the alternate threads library a try, since they are available now in Solaris 8. The good news is that it is not necessary to recompile your code, the interfaces all remain the same and using the alternate threads library in Solaris 8 is accomplished by simply changing your LD_LIBRARY_PATH to include /usr/lib/lwp

How do I get these Models?

Now that you know all about these various models and synchronization techniques, its time to try them out. I'll tell you now that if you have very few threads, you won't see much of a difference in your application, with the exception that you could have thread starvation occuring with 1.3 or 1.4 in its default mode on Solaris 8 or before. Note that this table is for pre-Solaris 9 only.
Feature1.2 option1.3 option1.4 option
Many-to-Many
thread based synchronization
n/adefault-XX:-UseLWPSynchronization
Many-to-Many
lwp based synchronization
default-XX:+UseLWPSynchronizationdefault
One-to-One
via bound threads
n/a-XX:+UseBoundThreads-XX:+UseBoundThreads
One-to-One
via Alternate Libthread*
export LD_LIBRARY_PATH=/usr/lib/lwpexport LD_LIBRARY_PATH=/usr/lib/lwpexport LD_LIBRARY_PATH=/usr/lib/lwp
*Note: Do not add /usr/lib/lwp to your LD_LIBRARY_PATH on Solaris 9 as the alternate libthread is the default.

What we have observed in the Java Performance Group

We have found two problems with the default model in 1.3, but have the options above in order to get around them.

In general, the many-to-many model with thread based synchronization is fine, although in rare cases we've seen thread starvation with moderate numbers of threads (somewhere near the number of cpus). In one study we slowly increased the numbers of threads on the system up to 2x the number of cpus and had each thread doing an equal amount of work. We then measured the difference between the thread doing the most work to the thread doing the least work, and found that with with the "alternate" libthread on Solaris 8 that the difference went to 8% from 29%. Performance, however, was not affected by that much (1-2%). This experiment taught us that the thread model does not generally make a performance difference, but when seeing thread starvation try using any of the alternative models. You'll note that in 1.2, lwp based synchronzation was the default, so it doesn't suffer from thread starvation.

The other problem is with scalability. Another form of thread starvation, we've found that there are not enough LWPs created to deal with large numbers of threads. In studies looking at 30 cpu machines running with 2000 threads, we found that the ratio of Solaris Threads to LWPs was around 2:1, but that this severely restricted the throughput of compute bound applications. LWPs are usually created when threads block in the kernel, but if your application doesn't block and simply performs computation, you can see reduced performance. 
When using -XX:+UseLWPSynchronization, the ratio went to 1:1, which gives us 1 LWP for every Solaris Thread, although those threads are not bound to the LWPs (they can hop around from LWP to LWP). This produced a 7x throughput. Moving to a one-to-one model via bound threads, which you might expect to be the same as LWP synchronization since we have a 1:1 ratio between LWPs and Solaris Threads, showed a decrease of over 80% (worst case). This was unexpected, but there must be some pretty bad overhead when binding the Solaris Threads to LWPs. Finally, going to the one-to-one model with the "alternate" libthread on Solaris 8 (and running on Solaris 9), we've found the best performance, an increase of 15% over LWP synchronization, and nearly a factor of 8x over the standard model with thread based synchronization. This may not be typical, but shows the extreme sensitivity on a heavily threaded application.

Here's a table of results on various Solaris boxes, all running Solaris 8 with JVM 1.3.1:

ArchitectureCpusThreadsModel%diff in throughput (against Standard Model)
Sparc30400/2000Standard---
Sparc30400/2000LWP Synchronization215%/800%
Sparc30400/2000Bound Threads-10%/-80%
Sparc30400/2000Alternate One-to-one275%/900%
Sparc4400/2000Standard---
Sparc4400/2000LWP Synchronization30%/60%
Sparc4400/2000Bound Threads-5%/-45%
Sparc4400/2000Alternate One-to-one30%/50%
Sparc2400/2000Standard---
Sparc2400/2000LWP Synchronization0%/25%
Sparc2400/2000Bound Threads-30%/-40%
Sparc2400/2000Alternate One-to-one-10%/0%
Intel4400/2000Standard---
Intel4400/2000LWP Synchronization25%/60%
Intel4400/2000Bound Threads0%/-10%
Intel4400/2000Alternate One-to-one20%/60%
Intel2400/2000Standard---
Intel2400/2000LWP Synchronization15%/45%
Intel2400/2000Bound Threads-10%/-15%
Intel2400/2000Alternate One-to-one15%/35%

As you can see, this experiment on 2 and 4 cpu boxes yielded quite different results. LWP Synchronization was the best on 2 cpus and the "alternate" thread library was the same as LWP Synchronization with 4 cpus. Using bound threads continued to show either no gain or a significant decrease in throughput. Going to only 400 threads on a 2 cpu box showed that LWP Synchronization was on par with the standard model, Bound threads cost 30% and the Alternate Libthread cost 10%. On a 4 cpu Solaris Intel box we saw similar results to the Sparc box, but with bound threads performing better and showing little to no degredation over the standard model.

Finally, we also seen more predicitibility by shying away from the standard model with thread based synchronization. Variability due to thread starvation seems to disappear when moving to any other model.


Other considerations when scaling to a large number of threads


Besides the threading model, there are other things you may want to consider when moving to a large number of threads, namely:
  • The thread stack size
  • Thread local heap
  • Garbage collection affects
  • Using Intimate Shared Memory

The default thread stack size is quite large: 512kb on Sparc and 256kb on Intel for 1.3 and 1.4 32-bit VMs, 1mb with the 64-bit Sparc 1.4 VM; and 128k for 1.2 VMs. If you have many threads (in the thousands) then you can waste a significant amount of stack space. The minimum setting in 1.3 and 1.4 is 64k, and in 1.2 is 32k, which you can change via the -Xss flag.

TLEs (in 1.3) or TLABs (in 1.4) are thread local portions of the heap used in the young generation (see the HotSpot Garbage collection Tuning Document). These offer excellent speedups on smaller numbers of threads (100s), but when moving up to larger numbers of threads the thread local heap can consume a significant amount of the total heap, so much so that garbage collection may occur more frequently. You can turn off thread local heaps completely with -XX:-UseTLE in 1.3 and -XX:-UseTLAB in 1.4. Alternatively you can size the thread local heap with -XX:TLESize=<value> in 1.3 and -XX:TLABSize=<value> in 1.4. Please note that TLEs/TLABs are only on by default in the Sparc -server JVM.

Garbage collection can radically affect performance as well. Please see the document on tuning garbage collection

ISM, or Intimiate shared memory, can also be used to boost the performance of memory intensive applications. This is a highly specialized option, and needs a few operating system parameters to be set in order to enable it, but can provide an additional 10% or more performance. Please see Big Heaps and Intimate Shared Memory for more details.

Conclusion

Choosing a different Solaris threading model may have an impact on your performance. The 1.3 and 1.4 VMs give you a myriad of options to choose from so that you can determine what's best for your application. The default model in 1.3, although generally fine, is not the best for applications with large numbers of threads or cpus. Our suggestion is to try various threading models if your application contains more than one thread. Also, make sure that you look at other factors that could affect your performance when your attempting to scale to larger numbers of threads or cpus.

1-10 of 15