This directory contains Cache Scope, a tool for collecting data centric cache information on the Itanium 2 processor. The directories inside this one are: bin The Cache Scope binary will be put here when built. lib The libraries used by Cache Scope will be put here when built. classes The Java classes for the Cache Scope data viewer will be put here when built. include Contains include files that may be used by users of Cache Scope (in order to call Cache Scope functions from a program that is being measured, for instance to control when counting is started and stopped). tools Contains the source and build directories for the various parts of Cache Scope (not included in the binary distribution). To build Cache Scope, go into the "tools" directory and type "make." Everything will be built and put into the appropriate directories. Note that building the data viewer "cscopeview" requires that java be installed. If java is not installed on the Itanium machine where the rest of Cache Scope is being built, the measurement tool and data file viewer can be built separately on different machines. To do this, instead of typing make in the tools directory, do a "make" in tools/cscope on an Itanium machine to build the measurement tool "cscope," and do a "make" in tools/cscope_view on a machine with Java to build the data file viewer. To run cscope, you need to have the perfmon kernel extension installed. Currently cscope only works with version 1.X of this extension (running 2.4 Linux kernels). See the perfmon page http://www.hpl.hp.com/research/linux/perfmon/ for information on how to download and install this extension. This extension should be in the standard 2.4 kernels. Contact bugs@dyninst.org for up to date information on additional supported kernel versions. You also need some of the gnu support libraries (stdc++,gcc_s,dwarf). If you don't have these handy, the distribution page for Cache Scope includes a tar file with them in it. If you extract this tar file in the .../cscope/lib directory, they will be properly loaded. ------------------------------------------------------------------------------ To setup to run Cache Scope: If cache Scope installed in the directory myDir: 1) Add myDir/bin in in your PATH variable 2) Add myDir/lib in your LD_LIBRARY_PATH environment variable 3) Define the environment variable CSCOPE_CLASSES to be myDir/classes 4) Define the environment variable DYNINSTAPI_RT_LIB to be myDir/lib/libdyninstAPI_RT.so.1 Run the cscope command, giving the name of the program you'd like to measure. The usage for the cscope command is: cscope [cscope options ] [program options] Program name is the name of the program you want to measure, and program options are the options you want to pass to the program. The cscope options can include: -v Verbose - print more warning/error/informational messages. -data Specifies a name for the output file containing the results. By convention, this should end in the suffix .cscope. The default is results.cscope. -debug-out Specifies a name for a file that Cache Scope will write various debugging/information messages to. By default no such file is created. -sample-frequency Sets the base sampling frequency. At least this number of cache events will go by between samples. The default is 1 (so that the number will be determined mainly by the random number generator, as described for -sample-randomness). -sample-randomness Sets parameters for the random number generator that will vary the sampling frequency. The first number (given in hexadecimal) is a bit mask that is used as follows: Each time an interrupt occurs and a sample is taken, the number of cache events that should go by before the next interrupt is set to a new value, which is the sum of the base sampling frequency and a random number anded with the mask. The default value is 0xffff. So, to use the default as an example, if the sampling frequency is 1, then that will be added to a random number between 0 and 65535 (0xffff) to produce the number of cache events that should take place before the next interrupt; so the resulting number of events will be from 1 to 65536. The second number, which is optional, is a seed for the random number generator. -no-sampling Tells the tool not to perform sampling. Statistics that do not rely on sampling are sill collected, such as the total number of L1 data cache read misses. -stack If this option is specified, then a stat bucket is created for the stack frame of each function (multiple invocations of the same function will use the same stat bucket for their stack frames). If a function is named "myfunc," then the stat bucket for cache events in its stack frame is called . If this option is not specified, the default is to treat the entire stack as a single object with the name . Note that enabling this function adds extra overhead to sampling. ------------------------------------------------------------------------------ Communicating with Cache Scope from a program being measured: Sometimes it is useful for a program that is being measured to communicate with Cache Scope, for instance in order to specify a certain point in the execution to start or stop measurement. This is done by linking with libcscope.a (which can be found in TOP/lib), and calling functions within it. These functions are stubs that do nothing when uninstrumented (or as will be described below, simply pass their parameters on to a standard C library function). When Cache Scope instruments a program, however, it replaces these functions with others that interact with the Cache Scope instrumentation. In C or C++, a user can include the file cscope.h (found in TOP/include) to get declarations for these functions. The functions are described below: void dctl_start_measurement() Tells Cache Scope to begin measurement. If there is no call to this function in the application, then Cache Scope will begin measurement from the start of execution. void dctl_stop_measurement() Tells cache scope to stop measurement. If there is no call to this function in the application, Cache Scope will stop measurement when the application exits. void *dctl_mallocn(size_t size, char *name) void *dctl_callocn(size_t n, size_t size, char *name) void *dctl_memalignn(size_t boundary, size_t size, char *name) These functions call malloc, calloc, and memalign, respectively, to allocate memory. The memory is placed into a stat bucket with the name given by the "name" parameter. void *dctl_mallocf(size_t size, char *fmt, ...) void *dctl_callocf(size_t n, size_t size, char *fmt, ...) These functions call malloc and calloc, respectively, to allocate memory. The memory is placed into a stat bucket with the name given by the parameter "fmt," which is treated as a printf-style format string, and the parameters that follow it. void *dctl_malloc_non_obj(size_t size) void *dctl_calloc_non_obj(size_t n, size_t size) void *dctl_realloc_non_obj(void *ptr, size_t size) void *dctl_memalign_non_obj(size_t boundary, size_t size) void dctl_free_non_obj(void *ptr) These functions call malloc, calloc, realloc, and memalign, or free to allocate or free memory. The memory is not placed into any stat bucket. This is useful if writing a specialized memory allocator. In such a case, you may want to allocate memory that is not considered part of a stat bucket, and then add parts of it to possibly different stat buckets as they are allocated for various uses. Note that memory allocated by one of these functions should always be freed with dctl_free_non_obj instead of just free. void dctl_register_mem(void *ptr, size_t size) void dctl_register_memn(void *ptr, size_t size, char *name) void dctl_register_memf(void *ptr, size_t size, char *fmt, ...) The functions register a section of memory as belonging to a named stat bucket. This is generally useful in conjunction with the _non_obj functions describe above. dctl_register_mem registers the memory with an automatically assigned name, dctl_register_memn registers it with the name given by the "name" parameter, and dctl_register_memf registers memory with a name generated by the format string and other parameters given. extern void dctl_unregister_mem(void *ptr) Removes a section of memory registered with one of the dctl_register_mem functions from the memory map. extern void dctl_register_moved_mem(void *old_addr, void *new_addr, size_t new_size) Tells Cache Scope that a block of memory registered with a dctl_register_mem function has been moved and possibly resized. Old_addr is the original registered address, new_addr is the new address, and new_size is the new size. ------------------------------------------------------------------------------ Examining the results from Cache Scope (cscope_view): The program cscope_view displays the results in the .cscope file from a Cache Scope run. This program is in Java, so to run it you will need a Java virtual machine. Assuming again the TOP is the top of your source tree, you can run cscope_view by typing: TOP/cscope/bin/cscopeview [filename] Filename is optional. It is the name of the file to open and view in the tool. If you do not specify a name (or you want to load a different file while running the program), you can choose "Open..." on the File menu to load a file. Once a file is open, the tabs allow you to choose between viewing a list of the objects causing the most latency, the functions causing the most latency, and the statistics about the size and number of objects in each stat bucket. The list of objects causing the most latency can be filtered by function, so that you see the objects causing the most latency in a particular function or set of functions. The list of functions causing the most latency can be filtered by object. Both of these are done by using the list box on the left side of the window. Note that when filtered, the %latency column shows the percent latency in the filtered objects/functions that is caused by each stat bucket, while the % of total latency columns shows the percentage as the percentage of total latency in the application.