Kerninst API Reference Guide

  1. Introduction
  2. Interface Reference

1. Introduction

This document describes the Kerninst Application Programming Interface (KAPI), which allows a programmer to build tools for modifying an OS kernel on the fly. There are many reasons why you may want to do this, for example: performance profiling (inject timers at selected locations in the kernel), tracing (insert calls to tracing routines anywhere you want), dynamic code optimization (replace a function with a better-optimized version on the fly).

The KAPI is provided as a collection of C++ classes, described in the next section, and represented by two files. The first file is a C++ header, "kapi.h", that defines the interface available to the mutators. The second file is a C++ library, libkerninstapi.a, that contains the implementation of the interface. Typically, mutators will '#include "kapi.h"' and link against libkerninstapi.a.

If you are looking for an introduction to usage of the KAPI, please refer to the Kerninst API Programming Guide, which provides an overview of the system and usage examples.

2. Interface Reference

As mentioned above, the KAPI is structured as a collection of C++ classes. Many of these classes directly correspond to the major instrumentation abstractions: kernel modules, functions, basic blocks, instrumentation points, code expressions, kernel memory regions. Naturally, they allow the programmer to query and manipulate the corresponding objects.

Most methods of KAPI classes are synchronous -- they wait for the requested action to complete. However, there are a few methods that are asynchronous: they initiate an operation and return to the caller immediately. Later on, the caller will be notified of a completion event via a callback. To support the callback functionality, every instrumentation tool has to handle KAPI events periodically. There are several ways this functionality can be integrated in the mutator. See "Handling API Events" for details.

As a general rule, each method that can fail returns an error code of the ierr integer type. Zero corresponds to successful completion, while negative values represent various error codes. Below we describe the interface of each API class. Refer to the API header, "kapi.h", for further details.

2.1 Class kapi_manager

Kapi_manager is the main class in the hierarchy. It is the entry point for most instrumentation activities: allocating/freeing kernel memory, copying to and from kernel memory, browsing through kernel code (locating modules and functions), instrumenting/de-instrumenting, handling instrumentation events, dispatching callbacks.

2.1.1 Initialization/Cleanup


Constructs an uninitialized instance of kapi_manager. Call attach() on this instance to initialize it.

ierr attach(const char *machine, unsigned port);

This method attaches kapi_manager to the kernel on the specified machine and port number. All KAPI actions should be carried out only after this method has been called on the global kapi_manager.

ierr detach();

This method will detach from the kernel, remove remaining instrumentation and perform other clean-up tasks. Typically, it is called on exit from a tool.

2.1.2 Operations on Kernel Memory

kptr_t malloc(unsigned nbytes);

Allocate a region of kernel memory of nbytes bytes. Returns the starting address (integer) of the region on success, or zero if allocation fails.

ierr free(kptr_t addr);

Free the region starting at addr, which should have been previously returned by kapi_manager::malloc(). Returns an error code if the free request fails.

ierr memcopy(kptr_t from_addr, void *to_addr, unsigned nbytes);

Copy nbytes bytes from the kernel space starting at address from_addr, into the user-level buffer specified by to_addr.

ierr memcopy(void *from_addr, kptr_t to_addr, unsigned nbytes);

Copy data in the opposite direction (from user to the kernel). Notice the difference in types for kernel and user addresses: kernel addresses are integers, user addresses are pointers.

void to_kernel_byte_order(void *data, unsigned size);

If the client and the kernel run on architectures with different byte ordering, this routine byte-swaps the provided user data to match the kernel byte ordering.

void to_client_byte_order(void *data, unsigned size);

Similarly to the method above, this routine converts the data copied from the kernel to match byte ordering of the client.

2.1.3 Operations on Kernel Code Structures

unsigned getNumModules() const;

Returns the total number of modules loaded in the kernel.

ierr findModule(const char *mod_name, kapi_module *mod);

Finds a kernel module with requested mod_name. Fills-in the provided mod object to represent this module in the API. Returns an error code if the module was not found.

ierr getAllModules(kapi_vector<kapi_module> *vec);

Retrieves a vector of all modules in the kernel.

ierr findFunctionByAddress(kptr_t addr, kapi_function *func);

Finds the function starting at addr and initializes func. Returns an error code if the function was not found.

ierr findModuleAndFunctionByAddress(kptr_t addr,
			            kapi_module *mod, kapi_function *func);

Finds a module which has a function starting at addr. Fills-in both mod and func objects. Returns an error code if the function was not found.

2.1.4 Instrumentation-related methods

int insertSnippet(const kapi_snippet &snippet, const kapi_point &point);

Inserts the snippet at the specified point in the kernel and return its handle. The return value represents either a positive snippet handle on success, or a negative error code on failure.

ierr removeSnippet(int handle);

Removes the previously-inserted snippet associated with the supplied handle. Returns an error code if the handle is invalid.

int uploadSnippet(const kapi_snippet &snippet, const kapi_vector<kapi_point> &points);

Uploads a snippet into the kernel, but does not splice it yet. It can later be spliced at any of the given points. The points vector may be empty (which will generate code that can be spliced anywhere) at the expense of having less efficient instrumentation. NOTE: This function is only supported in the SPARC/Solaris distribution.

ierr removeUploadedSnippet(unsigned handle);

Remote a previously-uploaded snippet denoted by handle. Returns an error code if the handle is invalid. NOTE: This function is only supported in the SPARC/Solaris distribution.

kptr_t getUploadedAddress(unsigned handle);

Finds where the snippet associated with handle has been uploaded. Returns zero if the handle is invalid. NOTE: This function is only supported in the SPARC/Solaris distribution.

ierr createInstPointAtAddr(kptr_t address, kapi_point *point);

Fills-in the kapi_point object to represent the instrumentation point at the specified kernel address. When instrumenting well-defined points like function entry/exit, it is recommended to use point-location methods of the kapi_function and kapi_basic_block classes instead.

ierr findSpecialPoints(kapi_point_location type, kapi_vector<kapi_point> *points) const;

Finds unusual instrumentation points that may be of interest to users: context switch code, system call path, ... Presently, only the switch-in-out points are implemented. Fills-in the provided vector with points found.

ierr setTestingFlag(bool flag);

Sets a flag within the kerninst daemon that determines whether instrumentation code will actually be spliced into the kernel. If flag is set to true, instrumentation will be generated in kernel memory but will not be spliced into execution. This function is most often used when debugging problems with the generated instrumentation code.

kapi_snippet getPreemptionProtectedCode(const kapi_snippet &unsafe_code);

Returns a snippet containing unsafe_code wrapped with code necessary to disable preemption. This is necessary if unsafe_code does per-processor variable updates. See also kapi_atomic_assign

2.1.5 Data Sampling

KAPI provides routines for peeking at kernel data at periodic intervals. The following methods implement this functionality.

struct kapi_mem_region {
    kptr_t addr;
    unsigned nbytes;
typedef int (*data_callback_t)(unsigned reqid, uint64_t time_of_sample,
                               const uint64_t *sampled_values, unsigned numvalues);
int sample_periodically(const kapi_vector<kapi_mem_region> &regions, data_callback_t cback, unsigned ms);

Starts copying memory from the kernel periodically, every ms milliseconds. Returns the request handle. The memory to sample is specified as a set of disjoint regions of type kapi_mem_region. When a sample arrives, the provided cback callback will be invoked with the contents of the regions concatenated in one vector. We assume that we are sampling a collection of 64-bit integers, so the data is converted accordingly if the daemon and the client run on different architectures. If ms is zero then sample just once.

ierr stop_sampling(int handle);

Stops the sampling request given a handle. Returns an error code if the handle is invalid.

ierr adjust_sampling_interval(int handle, unsigned new_ms);

Changes the sampling interval for a given request to new_ms<\code>. Two special cases: if new_ms is zero we stop sampling. If old_ms was 0 and new_ms is non-zero we start sampling. Returns an error code if the handle is invalid.

2.1.6 Handling API events

To support the callback functionality, the API needs to receive control periodically and handle pending events. The programmer's responsibility is to wait or poll to see if there are any events to be handled and invoke the handleEvents() method if so.

ierr handleEvents();

Handle pending events

KAPI provides two different ways to wait for incoming events: waitForEvents() and getEventQueueDescriptor()/select(). You can use a way, which better fits your application structure.

ierr waitForEvents(unsigned timeoutMilliseconds);

Waits for incoming events (callbacks and data samples). timeoutMilliseconds specifies the maximum amount of time you want to wait (set it to 0 if you do not want to block or to UINT_MAX to wait forever). Returns 1 if events are pending, 0 if timed out, < 0 on error.

int getEventQueueDescriptor();

Alternative way of waiting for events. Returns a file descriptor that you can add to read_fds and error_fds and call select() on them.

2.1.7 Handling Indirect Calls

The code analysis framework of KAPI allows the programmer to walk the static call graph of the kernel by starting at a top-level function, descending into its callees, and so on. See kapi_function::getCallees() for details. Unfortunately, kernel code is full of indirect calls, where targets are not known in advance. As a result, the static approach of kapi_function::getCallees() may not be able to find all callees.

Fortunately, KAPI provides primitives for discovering such callees at run time by instrumenting the corresponding call sites and snooping on destination addresses as the calls are made. To use this feature, the programmer needs to install a watchpoint on a callsite of interest with watchCalleesAt(addr), let the kernel run for a while, remove the watchpoint with unWatchCalleesAt(addr) and retrieve the accumulated callee information with getCalleesForSite().

ierr watchCalleesAt(kptr_t addrFrom);

Collect callee addresses for a given callsite of an indirect call. Use kapi_function::getCallees() to find such callsites for a given function.

ierr unWatchCalleesAt(kptr_t addrFrom);

Stop collecting callee addresses for a given callsite.

ierr getCalleesForSite(kptr_t siteAddr, kapi_vector *calleeAddrs, kapi_vector *calleeCounts);

Fill-in the supplied vectors with callee addresses and execution counts collected so far. The callsite must be in the unwatched state or this method will return an error

2.1.8 Support for Function Replacement

Function replacement is the ability for a programmer to substitute an alternate implementation of a kernel function for the default implementation. There are two independent methods provided by the KAPI to perform function replacement. Both methods are implemented to assume that the arguments to the original and replacement implementation of the function being replaced are identical. It is suggested that replacement versions of kernel functions be implemented in a kernel module rather than via KAPI snippets to assure correctness with respect to calling conventions and kernel data types.

The first method provided by KAPI, which we refer to as true function replacement, instruments the original function at its very first instruction to place an unconditional control transfer to the replacement. After the instrumentation has been performed, all calls made from anywhere in the kernel to the original function will be redirected to the replacement function. The following two methods are provided to do true function replacement

int replaceFunction(kptr_t oldFnStartAddr, kptr_t newFnStartAddr);

Replaces the function defined at oldFnStartAddr with the one at newFnStartAddr. The return value is a request id to be used when undoing the replacement.

ierr unreplaceFunction(unsigned reqid);

Nullifies the replacement associated with reqid.

The second method provided by KAPI, which we refer to as callsite function replacement, instruments a single callsite to replace the destination address of the call with the entry address of the replacement function. After the instrumentation has been performed, any time the modified callsite is executed, the replacement function will be called instead of the original. The following two methods are provided to do callsite function replacement.

int replaceFunctionCall(kptr_t callSiteAddr, kptr_t newFnStartAddr);

Replaces the destination address of the call at callSiteAddr with the value newFnStartAddr. The return value is a request id to be used when undoing the callsite replacement.

ierr unreplaceFunctionCall(unsigned reqid);

Nullifies the replacement associated with reqid.

2.1.9 Support for Disassembly

KAPI provides some support for code disassembly. The programmer can disassemble a function, a basic block or an arbitrary region of code. The results are returned as an instance of kapi_disass_object, which is a collection of kapi_disass_chunk instances. Each kapi_disass_chunk represents disassembly of a contiguous region of code and is a collection of kapi_disass_insn objects. Finally, a kapi_disass_insn object contains textual representation of disassembly results for an individual instruction.

ierr getDisassObject(const kapi_function &kfunc, bbid_t bbid,
		     bool useOrigImage, kapi_disass_object *pdis);

Disassembles a function or a basic block within the function. Fills-in the provided kapi_disass_object. Set bbid to 'bbid_t(-1)' to disassemble the entire function. Can disassemble either the current, possibly instrumented, function or the original image, before any instrumentation took place.

ierr getDisassObject(kptr_t start, kptr_t finish, kapi_disass_object *kdis);

Disassembles the specified range of addresses and fills-in the disass object

2.1.10 Support for Register Analysis

In order to perform efficient instrumentation, we are required to perform register liveness analysis to identify the registers that must be saved at some instrumentation point before being used by instrumentation. We recognize the usefulness of this information for other purposes as well, and therefore provide the following routine for retrieving the results of the analysis.

ierr getRegAnalysisResults(kptr_t addr, bool beforeInsn,
			   bool globalAnalysis,
			   char *buffer, unsigned maxBytes);

Places into buffer a printable string of register liveness analysis info for the given address addr. The analysis used is a standard reverse-path dataflow analysis, and provides liveness information from the end of the enclosing function to the given address. If beforeInsn is false, we include the effects of the instruction at addr in the analysis. If globalAnalysis is false, the results for this instruction in isolation are reported. If the result string is longer than the user buffer length maxBytes, a negative error code is returned. The string returned is of the following format:

         killed: <list of register names>
         made live: <list of register names>

2.1.11 Support for Virtualized Timers

The following methods are used to help support virtualized timers. Unless you fully understand the timer support model in Kerninst, you should not use these functions directly. Refer to the 'Timer & Performance Counter Usage' section of the KAPI Programming Guide for information on using virtualized timers.

kptr_t getAllVtimersAddr();

Returns the address of the in-kernel array of virtual timers.

ierr addTimerToAllVTimers(kptr_t timer_addr);

Appends the vtimer specified by timer_addr to the in-kernel array of virtual timers.

ierr removeTimerFromAllVTimers(kptr_t timer_addr);

Removes the vtimer specified by timer_addr from the in-kernel array of virtual timers.

2.2 Class kapi_module

This class represents a kernel module -- a logically-connected group of functions. We assume that the entire kernel is made of kernel modules, which is true of Solaris. On Linux, there is some portion of code that is only available in the kernel proper, and thus not part of any module. Therefore, on Linux we place all of this code (and the code of any modules compiled into the kernel) in a module named "kernel".


Constructs an uninitialized instance of kapi_module. The instance will be filled-in by the parent class kapi_manager (via the findModule method).

ierr getName(char *name, unsigned max_len_bytes) const;

Fills-in the module name. Returns not_enough_space if the actual module name is longer than max_len_bytes.

ierr getDescription(char *desc, unsigned max_len_bytes) const;

Some modules have short descriptions defined for them. This method fills-in the module description, if any, into desc. Returns not_enough_space if the actual module description is longer than max_len_bytes.

unsigned getNumFunctions() const;

Returns the number of functions in the module

ierr findFunction(const char *func_name, kapi_function *func) const;

Finds a function with this name in the module and fills-in func. Returns an error code if the function was not found.

ierr getAllFunctions(kapi_vector<kapi_function> *vec) const;

Fills-in a vector of all functions in the module

2.3 Class kapi_function

This class corresponds to a kernel function. It allows the programmer to navigate through the function's resources: basic blocks and instrumentation points.


Constructs an uninitialized instance of a kapi_function object. The instance should be filled-in later via calls to kapi_module::findFunction, kapi_manager::findFunctionByAddress, or kapi_manager::findModuleAndFunctionByAddress.

kptr_t getEntryAddr() const;

Returns the address of the function's entry point

ierr getName(char *name, unsigned max_bytes) const;

Fills-in the function name. Returns not_enough_space if the actual name is longer than max_len_bytes.

unsigned getNumBasicBlocks() const;

Returns the number of basic blocks in the function

ierr findBasicBlock(kptr_t addr, kapi_basic_block *bb) const;

Finds the basic block starting at addr and fills-in bb. Returns an error code if no basic block starts at addr.

ierr findBasicBlockById(bbid_t bbid, kapi_basic_block *bb) const;

Within the function, all basic blocks are enumerated (0 to N). This method fills-in bb given its sequential id. Returns an error code if bbid is greater than N.

bbid_t getBasicBlockIdByAddress(kptr_t addr) const;

Converts the starting address of a basic block into its sequential id. Returns bbid_t(-1) if no basic block starts at addr.

ierr getAllBasicBlocks(kapi_vector<kapi_basic_block> *vec) const;

Fills-in a vector of all basic blocks in the function.

ierr findEntryPoint(kapi_vector<kapi_point> *points) const;

Fills-in a vector of entry points of the function. Typically, this vector contains only one element.

ierr findExitPoints(kapi_vector<kapi_point> *points) const;

Fills-in a vector of exit points of the function. Contrary to the entry point, this vector can easily contain more than one element.

bool isUnparsed() const;

Some functions can not be analyzed and hence instrumented at this time. Attempts to instrument them will return an error. This function returns true if the function was unanalyzable.

unsigned getNumAliases() const;

There can be several names mapping to the same address. This method returns the number of aliases, including the primary name.

ierr getAliasName(unsigned ialias, char *buffer, unsigned buflen) const;

Fills-in the ialiasth alias name into buffer. Returns an error code if there is no such alias or if there is not enough space to store its name.

ierr getCallees(const kapi_vector<bbid_t> *blocks,
		kapi_vector<kptr_t> *regCallsFromAddr,
		kapi_vector<kptr_t> *regCallsToAddr,
		kapi_vector<kptr_t> *interprocBranchesFromAddr,
		kapi_vector<kptr_t> *interprocBranchesToAddr,
		kapi_vector<kptr_t> *unanalyzableCallsFromAddr) const;

Finds and reports all callees of this function. For each call we try to determine its source address (the address of the call instruction) as well as its destination address (the address of the callee). Regular calls, indirect calls (unanalyzableCallsFromAddr), and interprocedural branches are located. If the blocks argument is not NULL, only calls made in these basic blocks are reported. See "Handling Indirect Calls" for information on how to discover targets of indirect calls.

2.4 Class kapi_basic_block

This class represents a basic block of kernel code.


Constructs an uninitialized instance of the basic block object. The instance should be filled-in later via calls to kapi_function::findBasicBlock() or kapi_function::findBasicBlockById().

kptr_t getStartAddr() const;

Returns the start address of the basic block.

kptr_t getEndAddrPlus1() const;

Returns the address immediately following the last instruction in the basic block.

kptr_t getExitAddr() const;

Returns the exit address of the block. Basically, you need to insertSnippet at that address to catch the exit from the block. For the SPARC architecture, thanks to delay slots, it may or may not correspond to the last instruction in the block.

2.5 Class kapi_point

This class represents an instrumentation point -- a location in the kernel code where we can insert instrumentation.


Creates an uninitialized instance of the kapi_point object. The instance should be filled-in later via calls to kapi_function::findEntryPoint, kapi_manager::createInstPointAtAddr, etc.

kptr_t getAddress() const;

Returns the address associated with the point.

2.6 Class kapi_snippet and derived classes

kapi_snippet is the base class of every instrumentation code construct in KAPI. This class should never be used directly, as it merely exists to provide a polymorphic equivalence among the derived classes. Typically, the appropriate derived class is used instead.

2.6.1 Class kapi_const_expr

kapi_const_expr(kapi_int_t val);

This class represents a constant integer expression with value val.

2.6.2 Class kapi_int_variable

This class represents an integer variable in the kernel space. On 64-bit architectures, it represents a 64-bit integer. On 32-bit architectures, it represents a 32-bit integer. The class kapi_int64_variable is defined for 32-bit architectures to represent long long integers, and has the same construction methods as kapi_int_variable (described here).

kapi_int_variable(kptr_t addr);

Declares a variable stored at the pre-allocated kernel address specified by addr. Use kapi_manager::malloc(sizeof(kapi_int_t)) to allocate space first.

kapi_int_variable(kptr_t v, const kapi_snippet &index);

Declares an integer scalar variable v[index], which is an element of a vector, starting at address v. The exact location of the variable is determined at run time after the index expression is evaluated.

kapi_int_variable(const kapi_snippet &addr_expr);

Declares an integer variable, which location is specified by addr_expr and will be known only at run time.

2.6.3 Class kapi_arith_expr

This class allows the programmer to construct binary arithmetic expressions.

kapi_arith_expr(kapi_arith_operation kind, const kapi_snippet &lOpd, const kapi_snippet &rOpd);

Declares a binary expression, combining the values of lOpd and rOpd with the operation specified by kind. The following binary operations are currently supported.

enum kapi_arith_operation {

We suggest using kapi_atomic_assign instead of kapi_assign when the host on which kerninstd is running is a multi-processor and all processors are updating the same variable. The atomicity ensures safe updates to the variable being assigned. For per-processor variable updates, we recommend using kapi_manager::getPreemptionProtectedCode.

NOTE: On Power, be careful when using kapi_atomic_assign in instrumenting code that includes Load and Reserve and Store Conditional instructions. kapi_atomic_assign cannot be inside instrumentation that is placed after Load and Reserve, but before the corresponding Store Conditional instruction.

2.6.4 Class kapi_sequence_expr

This class allows the programmer to chain multiple expressions together.

kapi_sequence_expr(const kapi_vector<kapi_snippet> &exprs);

Declares a sequence expression. Expressions in the exprs vector will be evaluated in the order they are stored in the vector.

2.6.5 Class kapi_bool_expr

This class allows the programmer to construct boolean expressions.

kapi_bool_expr(bool value);

Declares a constant boolean expression with given value.

kapi_bool_expr(kapi_relation kind, const kapi_snippet &lOpd, const kapi_snippet &rOpd);

Declares a binary boolean expression, which applies relation kind to the values of lOpd and rOpd. The following relations are currently supported.

enum kapi_relation {

2.6.6 Class kapi_if_expr

This class allows the programmer to construct 'if-then' and 'if-then-else' expressions.

kapi_if_expr(const kapi_bool_expr &cond, const kapi_snippet &trueClause);

Declares an 'if-then' expression. The boolean cond expression is used to decide if trueClause should be executed.

kapi_if_expr(const kapi_bool_expr &cond, const kapi_snippet &trueClause, const kapi_snippet &falseClause);

Declares an 'if-then-else' expression. The boolean cond expression is used to decide whether to execute the trueClause or the falseClause expression.

2.6.7 Class kapi_param_expr

kapi_param_expr(unsigned n);

Declares an expression with value equal to the nth parameter of a function being instrumented. The first parameter corresponds to an n-value equal to zero. This snippet type is valid only at points that are entries to subroutines.

2.6.8 Class kapi_retval_expr

kapi_retval_expr(const kapi_function &func);

Declares an expression with value equal to the return value of func, which is being instrumented. This snippet type is valid only at points that are exits from subroutines.

2.6.9 Class kapi_call_expr

kapi_call_expr(kptr_t entryAddr, const kapi_vector<kapi_snippet> &args);

Declares a call expression to call a function starting at entryAddr and pass the given arguments vector args to it.

2.6.10 Class kapi_ret_expr


Declares a return statement, which is useful for generating subroutines on the fly. Do not confuse this statement with kapi_retval_expr.

2.6.11 Class kapi_hwcounter_expr

kapi_hwcounter_expr(kapi_hwcounter_kind type);

Declares an expression equal to the value of a hardware counter at the moment of evaluation. Note that the types of counters supported are dependent on the hardware architecture of the machine running the KernInst daemon (kerninstd). The following hardware counters are currently supported.

typedef enum {
    HWC_NONE = 0, // No counter selected
    HWC_TICKS = 1, // Processor cycle counter

On SPARC Systems:
    HWC_DCACHE_VREADS = 2,                // D-cache read references
    HWC_DCACHE_VWRITES = 3,               // D-cache write references
    HWC_DCACHE_VREAD_HITS = 4,            // D-cache read hits
    HWC_DCACHE_VWRITE_HITS = 5,           // D-cache write hits
    HWC_ICACHE_VREFS = 6,                 // Instruction cache references
    HWC_ICACHE_VHITS = 7,                 // Instruction cache hits
    HWC_ICACHE_STALL_CYCLES = 8,          // Cycles stalled handling I-cache misses
    HWC_BRANCH_MISPRED_VSTALL_CYCLES = 9, // Cycles stalled handling branch mispredictions
    HWC_ECACHE_VREFS = 10,                // L2 cache references
    HWC_ECACHE_VHITS = 11,                // L2 cache hits
    HWC_ECACHE_VREAD_HITS = 12,           // L2 cache read hits
    HWC_VINSNS = 13                       // Instructions completed

On Intel Pentium 4 or Xeon Systems:
    HWC_ITLB_HIT = 2,                     // Instruction TLB Hits
    HWC_ITLB_UNCACHEABLE_HIT = 3,         // Instruction TLB Hits (uncacheable)
    HWC_ITLB_MISS = 4,                    // Instruction TLB Misses
    HWC_ITLB_MISS_PAGE_WALK = 5,          // Page walks by the page miss handler due to Instruction TLB Misses
    HWC_DTLB_MISS_PAGE_WALK = 6,          // Page walks by the page miss handler due to Data TLB Misses
    HWC_L2_READ_HIT_SHR = 7,              // L2 cache Read Hits (shared)
    HWC_L2_READ_HIT_EXCL = 8,             // L2 cache Read Hits (exclusive)
    HWC_L2_READ_HIT_MOD = 9,              // L2 cache Read Hits (modified)
    HWC_L2_READ_MISS = 10,                // L2 cache Read Misses
    HWC_L3_READ_HIT_SHR = 11,             // L3 cache Read Hits (shared) 
    HWC_L3_READ_HIT_EXCL = 12,            // L3 cache Read Hits (exclusive)
    HWC_L3_READ_HIT_MOD = 13,             // L3 cache Read Hits (modified)
    HWC_L3_READ_MISS = 14,                // L3 cache Read Misses
    HWC_COND_BRANCH_MISPREDICT = 15,      // Conditional jumps mispredicted
    HWC_COND_BRANCH = 16,                 // Conditional jumps predicted
    HWC_CALL_MISPREDICT = 17,             // Indirect calls mispredicted
    HWC_CALL = 18,                        // Direct or indirect calls predicted
    HWC_RET_MISPREDICT = 19,              // Return branches mispredicted
    HWC_RET = 20,                         // Return branches predicted
    HWC_INDIRECT_MISPREDICT = 21,         // Indirect calls, indirect jumps, and returns mispredicted
    HWC_INDIRECT = 22,                    // Indirect calls, indirect jumps, and returns predicted
    HWC_MEM_LOAD = 23,                    // Memory Reads
    HWC_MEM_STORE = 24,                   // Memory Writes
    HWC_L1_LOAD_MISS = 25,                // L1 cache Read Misses
    HWC_DTLB_LOAD_MISS = 26,              // Data TLB Read Misses
    HWC_DTLB_STORE_MISS = 27,             // Data TLB Write Misses
    HWC_INSTR_ISSUED = 28,                // Instructions issued (including re-issues due to bad speculation)
    HWC_INSTR_RETIRED = 29,               // Completed instructions retired (NOTE: only available on P4/Xeon processors with model encoding 3)
    HWC_UOPS_RETIRED = 30,                // Instruction micro-ops retired
    HWC_BRANCH_TAKEN_PREDICT = 31,        // Branches taken predicted
    HWC_BRANCH_TAKEN_MISPREDICT = 32,     // Branches taken mispredicted
    HWC_BRANCH_NOTTAKEN_PREDICT = 33,     // Branches not-taken predicted
    HWC_BRANCH_NOTTAKEN_MISPREDICT = 34,  // Branches not-taken mispredicted
    HWC_PIPELINE_CLEAR = 35               // Instruction Pipeline Flushes

On power4 systems:

    HWC_RUN_CYCLES,                       // Processor Cycles gated by the run latch                
    HWC_PROCESSOR_CYCLES,                 //Processor cycles
    //NOTE: on power HWC_TICKS uses TB (timebase) register 
    HWC_INSTRUCTIONS_COMPLETED,           //Instructions completed
    HWC_L1_DATA_MISS,                     //L1 D-cache misses
    HWC_L2_DATA_INVALIDATION,             //L2 D-cache invalidations 
    HWC_INSTRUCTIONS_DISPATCHED,          //Instructions dispatched
    HWC_L1_DATA_STORE,                    //L1 D-cache stores
    HWC_L1_DATA_LOAD,                     //L1 D-cache reads
    HWC_L3_DATA_LOAD,                     //L3 D-cache reads
    HWC_DATA_MEMORY_LOAD,                 //Demand loads from memory
    HWC_L3_DATA_LOAD_MCM,                 //L3 D-cache reads from another MCM
    HWC_L2_DATA_LOAD,                     //l2 D-cache reads
    HWC_L2_DATA_LOAD_SHARED,              //L2 D-cache reads of shared data
    HWC_L2_DATA_LOAD_MCM,                 //L2 D-cache reads from another MCM
    HWC_L2_DATA_LOAD_MODIFIED,            //L2 D-cache reads of modified data
    HWC_L2_DATA_LOAD_MODIFIED_MCM,        //L2 D-cache reads of modified data from another MCM
    HWC_MEMORY_INSTRUCTIONS_LOAD,         //Instructions loaded from memory
    HWC_L2_INSTRUCTIONS_LOAD,             //L2 I-cache reads
    HWC_L2_INSTRUCTIONS_LOAD_MCM,         //L2 I-cache reads from another MCM
    HWC_L3_INSTRUCTIONS_LOAD,             //L3 I-cache reads
    HWC_L3_INSTRUCTIONS_LOAD_MCM,         //L3 I-cache reads from another MCM
    HWC_L1_INSTRUCTIONS_LOAD,             //L1 I-cache reads
    HWC_PREFETCHED_INSTRUCTIONS,          //Prefetched instructions
    HWC_NO_INSTRUCTIONS                   //Cycles when no instructions fetched

} kapi_hwcounter_kind;

In order to successfully use the kapi_hwcounter_expr for kapi_hwcounter_kind values other than HWC_TICKS, you must also use the kapi_hwcounter_set class methods to enable the counter on the target machine. An example of the operations to be performed follows, where type is some valid kapi_hwcounter_kind value:

      // Read the current performance counter configuration
      kapi_hwcounter_set oldPcrVal;
      if(oldPcrVal.readConfig() < 0) {
         cerr << "ERROR: unable to read performance counter settings\n";

      // Update the settings to include the desired counter type
      kapi_hwcounter_set newPcrVal = oldPcrVal;
      // If the settings have changed, write to enable the new counter type
      if(!newPcrVal.equals(oldPcrVal)) {
         if(newPcrVal.writeConfig() < 0) {
            cerr << "ERROR: unable to write performance counter settings\n";
Note that the above example code does not account for conflicts between the old and new performance counter settings. If you plan on using multiple types of kapi_hwcounter_expr at the same time, it is necessary that you use the kapi_hwcounter_set::conflictsWith() method appropriately. If you wish to use this expression for timing purposes, we suggest you instead make use of the KAPI timer library as documented in the 'Timer & Performance Counter Usage' section of the KAPI Programming Guide.

2.6.11 Class kapi_cpuid_expr


Declares an expression that evaluates to the physical id of the CPU that executed this code. Note that on certain multiprocessor machines, the physical id's may not be sequential.

2.6.12 Class kapi_pid_expr


Declares an expression that evaluates to the process id (pid) of the executing thread. For the Solaris operating system, kernel threads have a pid of zero.

2.7 Class kapi_vector

This class is a container used to hold objects used by the API. It provides a small subset of the STL vector interface and serves as an intermediary between the programmer and the internal vector representation in KAPI.


Creates an empty vector.

void push_back(const T &item);

Appends item to the end of the vector

unsigned size() const;

Returns the number of elements in the vector

reference operator[] (unsigned i);

Returns a reference to the ith element of the vector.

iterator begin();
iterator end();

Return iterators to the start and end, respectively, of the vector.

void clear();

Remove all elements from the vector

2.8 Class kapi_hwcounter_set

This class represents and manipulates the set of hardware counters currently selected on the processor(s) of the target machine.


Constructs an empty counter set. Notice that it does not match the current state of counters in the processor(s). See readConfig().

ierr readConfig();

Synchronizes the set with the processor state by reading what counters are actually enabled there.

kapi_hwcounter_kind insert(kapi_hwcounter_kind kind);

Inserts a counter in the set. Since the number of active counters is typically limited and some counters conflict with each other, the insertion may force another counter out of the set. The old value is returned. Notice that the insert() method does not change the state of counters in the processor(s) -- all changes need to be committed with a later writeConfig().

void free(kapi_hwcounter_kind kind);

Removes the counter with associated kind from the set.

bool conflictsWith(kapi_hwcounter_kind kind) const;

Checks to see if the given counter kind can be enabled with no conflicts: without forcing another counter from the set.

ierr writeConfig();

Writes the current selection of counters into the processor(s) state.

2.9 Class kapi_disass_object

This class is the top-level interface to disassembly. It represents a collection of kapi_disass_chunks.


Constructs an uninitialized disassembly object. Use kapi_manager::getDisassObject() to populate it.

const_iterator begin() const; const_iterator end() const;

Return iterators to the start/end of the kapi_disass_chunk collection.

2.10 Class kapi_disass_chunk

This class represents a collection of kapi_disass_insns. It should not be constructed directly -- kapi_disass_object does that for you.

const_iterator begin() const;
const_iterator end() const;

Return iterators to the start/end of the kapi_disass_insn collection.

2.11 Class kapi_disass_insn

This class contains textual representation of a disassembled instruction. It should not be constructed directly -- kapi_disass_chunk does that for you.

const char *getDisassembly() const;

Returns the textual representation of the instruction.

const void *getRaw() const;

Returns the binary representation of the instruction.

unsigned getNumBytes() const;

Returns the size of the binary representation.

bool hasDestFunc() const;

True iff the instruction is a call and we know its destination.

const char *getDestFuncInfo() const;

If the instruction is a call, this method returns the name of the function it is calling.