Content Aware Intelligent Web Graphics   Table of contents   Indexes   XLink and Publishing Opportunities

Dima, Alden
 Gaithersburg 
National Institute of Standards and Technology
 USA 
 
Alden Dima
 Computer Scientist
National Institute of Standards and Technology
 100 Bureau Dr. Stop 8970 Gaithersburg (Maryland)  USA (20899-8970)
Email: mailto:alden.dima@nist.gov Web site:http://www.nist.gov
 Biography
 Alden Dima is a Computer Scientist in the Information Technology Laboratory at the National Institute of Standards and Technology in Gaithersburg, Maryland where he helped develop Viper, a Java-based parser/diagnostic tool for the Virtual Reality Modeling Language (VRML). He is the principle designer and implementor of VMView, a Java application and applet execution-tracing tool. Currently, Alden is a Consulting Engineer for the Real-Time Extensions for the JavaTM Platform Expert Group. He has a BEE from the Georgia Institute of Technology, an MS (in Electrical Engineering) from the University of Connecticut and an MS (in Computer Science) from the George Washington University. Prior to joining NIST in 1997, Alden worked for about eight years as an engineer and programmer in the electric power utility and telecom industries where he worked on (among other things) the development of large-scale customer service-related systems. He is a member of the IEEE and the IEEE Computer Society.
 

Introduction

 Many development tools such as execution tracing tools generate a large amount of text output. The large quantity of text makes it hard for the user to sort through and make sense of the output. In addition, the output is often formatted in a manner that makes automated filtering inconvenient.
 One such tool is VMView (http://www.nist.gov/vmview), a diagnostic tool that can be embedded at runtime into the Java Virtual Machine (JVM). It is currently under development at the National Institute of Standards and Technology. VMView provides a means to capture the internal state changes of the JVM during the execution of Java applications and applets as an execution trace, which can then be analyzed
 We will discuss our experience with converting this diagnostic tool to produce and use an Extensible Markup Language-based output. The use of an XML-based output format for software diagnostic tools solves several problems. It provides a convenient output format suitable for input to a filtering tool - allowing the user to more effectively manage complex data. It also allows for the separation of the data from its presentation - something that is more difficult to accomplish with the standard text formatting techniques used by most tools.
 

What is VMView?

 JavaTM applications and applets are executed on the Java Virtual Machine (JVM) which provides a platform-neutral environment [2]. VMView (http://www.nist.gov/vmview) is a diagnostic tool that can be embedded at runtime into the Java Virtual Machine. It is currently under development at the National Institute of Standards and Technology. VMView captures the internal state of the Java Virtual Machine during the execution of Java applications and applets in the form of an execution trace, which can then be analyzed. Some of the information captured in the trace is difficult to obtain otherwise.
 The "pre-XML" version of VMView consisted of a Win32 DLL written in C++, and a Tcl/Tk-based GUI. The DLL uses the Java Virtual Machine Debugger Interface (JVMDI) and the Java Native Interface (JNI). The execution trace is a formatted ASCII file that can be browsed by the VMView GUI.
 VMView has several potential uses including debugging Java applications and identifying potentially malicious code. These two uses are discussed below.
 

Debugging Java Applications

 Debugging a Java application requires knowing what is really going on inside that application at run-time. There are two basic strategies for doing this: controlled execution and execution tracing [3]. Controlled execution is implemented by debuggers and interrupts the program so that its state can be examined. It focuses on a single program state at a time. In contrast, execution tracing allows a program to run without intervention and captures program states as a sequence of events.
 Java programmers typically implement these two strategies in the following ways:
 
  •  They use a Java debugger such as Sun's jdb.
  •  They add print statements to their code to create a simple execution trace.
  •  They run their applications using a debugging version of the Java virtual machine called java_g to generate a more complete execution trace.
 Each of these strategies has its problems. A debugger is a powerful tool, but like a microscope, it can be difficult to find a small "feature" over large areas. For example, many Java programs are multi-threaded GUI programs that don't have a nice sequential execution. Where does one begin to look if there is a subtle problem? Debugging code that outputs the state of the program in certain critical places must be either removed or deactivated prior to shipping the final product. The addition and removal of debugging code can be time consuming and may introduce errors into the final program. Programmers rarely have time to develop a consistent debugging message style or advanced tracing features.
 Prior to Java 2, the Java Development Kit contained a debugging version of the Java interpreter, java_g, which could trace method calls and bytecode during program execution [2]. With the release of Java 2, java_g is no longer a part of the JDK. Despite its potential, program tracing via java_g was rather inflexible - it was an all or nothing proposition. Once enabled, the tracing occurs for all classes loaded by the virtual machine and the traces could quickly grow to huge proportions. For example, consider the canonical "Hello World" program:
 
public class HelloWorld {
 
public static void main(String[] argv) {
 
System.out.println("Hello, World!");
 
}
 
}
 If executed using the java_g's bytecode tracing option (-t), the trace output contains over 149,000 lines or about 6.4 MB of data. When tracing method calls using the - tm option, the trace output contains 4,819 lines or about 400 kilobytes of data.
 

Tracing Java Applications with VMView

 VMView is intended to provide a flexible alternative to debuggers, debugging code and java_g traces. Like a debugger, VMView will provide certain types of "low-level" information that cannot be obtained via print statements. However, because VMView allows Java applications/applets to run without intervention and captures information for later analysis, the developer will also be given a broader picture of the state of the program during execution. VMView provides a record of the program state during execution. However since it operates at the VM level, application-level source code modifications are not necessary. In addition, the traces will always have a consistent format and meaning. Unlike the old java_g-based tracing, VMView is much more flexible and focused. For example, when tracing the Hello World program described above, VMView produces a 6 line, 366-byte bytecode trace and 25 line, 492-byte method call trace. The difference in size between the JDK-based trace and the VMView-based trace is due to the fact that VMView-based tracing can be tailored by the user - VMView will restrict its tracing to the methods of one or more user-specified "trace" classes, whereas java_g traces all loaded classes. In addition to debugging applications, VMView also lends itself to software quality assurance, software measurement and security testing.
 The following is a small sample of a non-XML-based VMView trace. It would normally appear inside VMView's GUI text widget. The trace was created using the SimpleExample demo that is included with the Java Development Kit 1.2. This example was recompiled with the "-g" option so that local variable information is included in the class file.
 
THREAD_START: Thread[main,5,main]
 
CLASS_LOAD
 
    THREAD: Thread[main,5,main]
 
    CLASS: [Ljava/lang/String;
 
 
METHOD_TRACE
 
    METHOD: LSimpleExample; main([Ljava/lang/String;)V
 
    LOCATION: 0 (method start)
 
    THREAD: Thread[main,5,main]
 
    PARAMETERS:
 
        [Ljava/lang/String; s = [Ljava.lang.String;@48a82fb6
 
 
BYTECODE_TRACE
 
    METHOD: LSimpleExample; main([Ljava/lang/String;)V
 
    THREAD: Thread[main,5,main]
 
 
00000: new
 
00003: dup
 
00004: invokespecial
 This trace fragment shows the application thread created by VMView starting and the invocation of the main method. After the main method invocation, several of the methods bytecodes are shown. VMView does not currently show the bytecode operands. They can be determined in part by using javap with the "-c" option.
 A benefit of tracing with VMView is that Java exceptions that are caught and handled internally by an application are now visible. For example, during its execution, the previously discussed demo application throws a series of exceptions that are caught and handled internally. As a result, the user is never aware of their presence. One such exception is:
 
EXCEPTION
 
    TYPE: java.lang.ClassNotFoundException: javax/swing/plaf/basic/resources/basic_en_US
 
    THROWN_BY: Thread[main,5,main]
 
    THROWN_AT: Ljava/lang/ClassLoader; loadClass(Ljava/lang/String;Z)Ljava/lang/Class; @ 32
 
    TARGET: Ljava/lang/ClassLoader; loadClass(Ljava/lang/String;Z)Ljava/lang/Class; @ 39
 From a user point-of-view, the ability to detect these types of exceptions can be important. For example, a hostile class could secretly test the security policy in effect and hide any resulting security exceptions thrown by the VM. VMView-style tracing can detect this type of activity. Excessive exception throwing by classes in an application or library may also be indicative of an inadequate design.
 For the programmer, the ability to trace exceptions quickly reduces the temptation to modify source code in order to track down problems involving exceptions that are caught but maybe not handled correctly. The less that source code is modified the fewer additional errors that are introduced during debugging.
 

VMView Issues

 Despite what has been accomplished with VMView, there is room for improvement in a number of areas. We have identified the following issues:
 
  •  Traces can be excessively large - Despite VMView's ability to focus tracing to specific classes, traces can easily reach 10 MB or more in size. VMView would benefit greatly from the ability to filter traces.
  •  The trace format is fixed - Differing uses of VMView may require unique trace formats. The plain ASCII trace output is difficult to translate to other formats.
  •  Context-sensitive searches are difficult - Similar strings can appear in different contexts, making it difficult to easily narrow the search.
  •  It is inconvenient to view traces with other tools - We would like to easily take advantage of other browsers and viewers. This would give us the ability to focus our development on the core VMView technology rather than having to divert our attention to recreating basic GUI functionality.
 

VMView/XML Goals

 We are currently in the process of reworking VMView to make use of XML. Our goals for this new version of VMView are:
 
  •  To generate efficient, human-friendly XML output that is easy to maintain and debug and is compatible with VMView's internal design.
  •  To allow for the filtering of the execution trace using regular expression pattern matching.
  •  To be able to browse the trace using unrelated tools such as third party Web browsers.
  •  To enable easy translation of the trace to other formats.
  •  To facilitate efficient and complex searches of the traces.
 

VMView/XML Architecture

 We have re-architected VMView as shown in Figure 1. The VMView DLL has been modified to generate an XML-based trace. A new Swing/JFC-based GUI has been created that allows the user to filter the XML-based trace and invoke a standard Web browser to view the trace.
 
 Figure 1. VMView/XML Architecture
 

Generating XML-based Traces

 The VMView execution trace document type definition (DTD) evolved as we worked with filtering and translating the XML-based trace. The first two VMView/XML goals directly impacted the design of the XML-based trace output. Execution tracing is both processor- and disk-intensive. The structure of the execution trace has to be relatively efficient. However, a "human-friendly" structure also facilitates debugging and maintenance, so a balanced approach is essential, especially during early development. Since the new trace filtering capabilities are critical, the structure of the trace must also reflect its needs. We discovered that a flatter, less nested design simplified the design of the trace filter and took advantage of this fact in our work.
 The major change required to generate an XML-based trace was to find and modify all the ASCII-text generating output code in the VMView DLL. The VMView execution trace DTD was also embedded with the DLL so that all VMView output contains its logical structure regardless of how the VMView DLL is used.
 These changes were straightforward. The XML-based output is generated using C++'s iostream facility. Individual tags were defined in a header file - for example the following defines the tags for a method call event:
 
const string CALL_O = string("<CALL>");
 
const string CALL_C = string("</CALL>");
 The trace itself is created using the C++ string concatenation operator to build up each item in the trace. For example, a portion of the method call element is created as:
 
traceLine += IN_1 + CLASS_O + className + CLASS_C
 
    + IN_1 + MTD_O + CDATA_O + methodName + CDATA_C + MTD_C
 
    + IN_1 + SIG_O + methodSig + SIG_C    
 
    + IN_1 + LOC_O + locationString + LOC_C
 
    + IN_1 + THRD_O + threadString + THRD_C;
 IN_1 is a formatting instruction that indents the output one level. It is one of several formatting instructions that make the trace more human readable.
 There were places where we had to use CDATA tags to avoid interpreting element data as tags. For example, the Java Virtual Machine uses the method name <init> for object constructors. Object instance string representations can also contain characters that can be misinterpreted as XML tags, requiring the use of CDATA tags.
 The result of these design issues can be seen in the following trace event element:
 
<THROW>
 
    <MSG><![CDATA[java.util.EmptyStackException]]></MSG>
 
    <THRD>Thread[main,5,main]</THRD>
 
    <CLASS>Ljava/util/Stack;</CLASS>
 
    <MTD><![CDATA[peek]]></MTD>
 
    <SIG>()Ljava/lang/Object;</SIG>
 
    <LOC>16</LOC>
 
    <CLASS2>Ljava/util/Stack;</CLASS2>
 
    <MTD2><![CDATA[main]]></MTD2>
 
    <SIG2>()Ljava/lang/Object;</SIG2>
 
    <LOC2>26</LOC2>
 
</THROW>
 As previously mentioned, the filtering of traces was also simplified by using a flatter format rather than a more deeply nested format. Evidence of this approach can be seen in the exception event element shown above. The last four elements have a "2" suffix instead of being nested inside a <HANDLER> element.
 

Filtering Traces

 Once a trace is generated, events can be filtered using a user-specified set of regular expressions. The user can filter on multiple event fields and across multiple event types simultaneously. The user interface associated with trace filtering is shown in Figure 2.
 
 Figure 2. The VMView trace filtering user interface.
 Two XML-related APIs were available to implement the trace filtering mechanism. The Simple API for XML (SAX) is an event-driven API for processing XML documents. It implements a simple method of processing XML documents - reading the content as a stream of data and treating markup tags as events. The Document Object Model (DOM) takes an alternative approach: it makes the entire document available to the processor in a "random-access" mode. [1]
 Because of the potential for large documents, we have implemented the filter using SAX rather than DOM to avoid excessive memory requirements. The use of DOM implies the existence of a large number of Java objects, one for each XML element in an execution trace. With a SAX-based implementation, the trace filter only has to deal with one VM event at a time. With proper care, the filter can be implemented to reduce the burden on Java's garbage collector and improve overall performance. The IBM SAX-based parser has worked well with some rather large traces.
 The trace filter extends org.xml.sax.HandlerBase that serves as the default behavior for four of the SAX interfaces including the document handler interface. It is associated with a new parser instance via the parser's setDocumentHandler method and responds to events generated by the parser. For example, the parser will call the trace filter's character method when an element's character data is parsed. The trace filter will respond by adding this content to the Java object that represents the current element being parsed. When the parser generates the endElement event and the element in question represents a VM event, the element object will check itself against a set of user-specified regular expressions to determine if it belongs in the trace. If it does, it will print itself. Figure 3 illustrates trace filtering as a Unified Modeling Language (UML) sequence diagram. UML is rapidly becoming the dominant notation in object-oriented programming and consists of a number of standardized diagrams. UML sequence diagrams are particularly useful for illustrating the handling of messages and events within a system.
 
 Figure 3. A UML sequence diagram for the filtering of a VMView trace. Time increases downward along the vertical axis. The participating VMView components are shown across the top from left to right.
 

Viewing Traces

 To minimize the development effort, we chose to make use of existing Internet browsers to view the output traces. At present VMView implements a Java-based XML to HTML 4.0 translator that translates the execution trace and embeds a cascading style sheet prior to viewing. We investigated directly browsing the XML-based trace via style sheets, but in the end decided to use HTML translation. We looked at using CSS2 but the current browser implementations do not provide the needed CSS2 functionality for our application. The key missing feature was the ability to generate text via the CSS2 content keyword. This is necessary since for our application, we want to generate a field label next to the element value when viewing the trace. Future implementations will re-implement this functionality using XSL.
 The translation to HTML is not a problem since the trace format suits our current needs. There is no need to perform radical transformations - yet. In the future, we plan to directly browse the XML traces via XSL-based style sheets. The XML to HTML translation is accomplished via a relatively simple Java program that uses regular expression to match and replace the XML tags with HTML tags and generated text. We made use of the HTML 4.0 class feature and span tags and embedded a CSS stylesheet into the output. Figure 4 shows how a VMView trace appears in Microsoft Internet Explorer 5.
 
 Figure 4. Viewing a VMView/XML trace.
 

Searching Traces

 A VMView trace can be searched for events through repeated filtering. What is currently lacking however, is the ability to relate an element in a filtered trace back to its original trace so that a better picture of its context emerges. A simple way of accomplishing this would be to give each VM event a unique ID, so that it can quickly be found in the original trace after filtering. A better approach would involve modifying the trace filter to maintain an index of the events in the original trace for quick lookup afterwards. We plan to further develop this approach.
 We are also considering other means of building XML-based trace searching into VMView/XML. Web browsers such as Microsoft Internet Explorer do provide simple text searching capabilities, but we would like to create a browser-based search facility for complex context-sensitive searches. This could be implemented either using the browser's internal XML parser or a Java applet.
 

Benefits of using XML

 While working on VMView/XML, we discovered several benefits to using a XML-based approach when implementing diagnostic tools such as VMView. We were able to use a validating XML parser to automatically ensure that the tool's output is well formed and follows the logical structure defined in the DTD. This saved us from either having to perform additional tedious manual verification of the output or having to create a custom parser for this purpose.
 It was easy to implement output filtering using existing XML parsers and to translate the trace to other formats even when using our simple "brute force" techniques. The alternative would have been to implement a traditional parser or to restrict the tool's plain ASCII text output to a format that can be more easily filtered by utilities that use regular expressions. Future work with XSL promises to extend and further simplify transformations of VMView traces.
 We were able to use existing Web browsers to view the tool's output. As a result, we gained the basic navigation and printing without having to re-implement this functionality for our needs. This reduced our development time and tool installation footprint.
 

Conclusions

 The use of an XML-based output format for software diagnostic tools solves several problems. It gives the output a convenient format suitable for filtering - allowing the user to more effectively manage the data complexity. For VMView, trace filtering is an important feature because of the size of unfiltered traces.
 XML also allows for the separation of the data from its presentation - something that is more difficult to accomplish with the standard text formatting techniques used by most tools. This in turn also allows for the use of standard Web browsers to view the tool output, reducing development time and effort. VMView's XML-based trace was easily converted to HTML for browsing by browsers such as Microsoft's Internet Explorer. This allows us to spend more time on VMView's core functionality rather than on its user interface.
 

Use of trade name products and trademarks

 The presence or absence of a particular trade name product does not imply criticism or endorsement by the National Institute of Standards and Technology, nor does it imply that the products identified are necessarily the best available.
 Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.
 

References

 
  1.  Neil Bradley, The XML companion, Addison-Wesley, 1998
  2.  David Flanagan, Java in a Nutshell (2nd Edition), O'Reilly and Associates, 1997
  3.  Arthur Vargas Lopes, Very High-Level Debugging: Evaluation of Diagnosis and Solutions for Ada Concurrent Programs, Doctoral Dissertation, The George Washington University, Washington, D.C., 1992.

Content Aware Intelligent Web Graphics   Table of contents   Indexes   XLink and Publishing Opportunities