!C99Shell v. 1.0 pre-release build #16!

Software: Apache/2.0.54 (Fedora). PHP/5.0.4 

uname -a: Linux mina-info.me 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006 i686 

uid=48(apache) gid=48(apache) groups=48(apache)
context=system_u:system_r:httpd_sys_script_t
 

Safe-mode: OFF (not secure)

/usr/share/doc/valgrind-callgrind-0.9.11/   drwxr-xr-x
Free 3.83 GB of 27.03 GB (14.18%)
Home    Back    Forward    UPDIR    Refresh    Search    Buffer    Encoder    Tools    Proc.    FTP brute    Sec.    SQL    PHP-code    Update    Feedback    Self remove    Logout    


Viewing file:     ct_main.html (23.34 KB)      -rw-r--r--
Select action/file-type:
(+) | (+) | (+) | Code (+) | Session (+) | (+) | SDB (+) | (+) | (+) | (+) | (+) | (+) |
Callgrind: A call-graph oriented Cache Simulator and Profiler

Callgrind: A call-graph generating Cache Simulator and Profiler

Last updated for Version 0.9.8

Callgrind (previously named Calltree) is a Valgrind Tool, able to run applications under supervision to generate profiling data. Additionally, two command line tools (PERL scripts) are provided:

  • callgrind_annotate
    To make use of the data produced by the profiler, this script loads the dumps and gives out sorted lists of functions, optionally with annotation.

  • callgrind_control
    This tool enables you to interactively observe and control the status of currently running applications supervised. You can get statistic information, the current stack trace, and request zeroing of counters, and dumping of profiles.
To use this skin, you must specify --tool=callgrind on the Valgrind command line or use the supplied script callgrind.

This tool is heavily based on the Cachegrind Tool of the Valgrind package. Read the documentation of Cachegrind first; this page only describes the features supported in addition to Cachegrinds features.

Detailed technical documentation on how Callgrind works is available here. If you want to know how to use it, you only need to read this page.

1. Purpose

1.1 Profiling as Part Of Application Development

When you develop a program, usually, one of the last steps is to make it as fast as possible (but still correct). You don't want to waste your time optimizing functions rarely used. So you need to know in which part of your program most of the time is spent.

This is done with a technique called Profiling. The program is run under control of a profiling tool, which gives you the time distribution among executed functions in the run. After examination of the program's profile, you probably know where to optimize and afterwards you verify the optimisation success again with another profile run.

1.2 Profiling Tools

Most known is the GCC profiling tool GProf: You need to compile your program with option "-pg"; running the program generates a file "gmon.out", which can be transformed into human readable form with the command line tool "gprof". The disadvantage is the needed compilation step for a prepared executable, which has to be statically linked.

Another profiling tool is Cachegrind, part of Valgrind. It uses the processor emulation of Valgrind to run the executable, and catches all memory accesses for the trace. The user program does not need to be recompiled; it can use shared libraries and plugins, and the profile measuring doesn't influence the trace results. The trace includes the number of instruction/data memory accesses and 1st/2nd level cache misses, and relates it to source lines and functions of the run program. A disadvantage is the slowdown involved in the processor emulation, it's around 50 times slower.

Cachegrind only can deliver a flat profile. There is no call relationship among the functions of an application stored. Thus, Inclusive Costs, i.e. costs of a function including the cost of all functions called from there, can't be calculated. Callgrind extends Cachegrind by including call relationship and exact event counts spent while doing a call.

Because Callgrind is based on simulation, the slowdown due to some preprocessing of events while collecting does not influence the results. See the next chapter for more details on the possibilities.

2. Usage

2.1 Basics

To start a profile run for a program, execute
    callgrind [options] program [program options]
After program termination, a profile dump file named "callgrind.out.pid" is generated with pid being the process ID number of the profile run.

This will collect information

  1. on memory accesses of your program, and if an access can be satisfied by loading from 1st/2nd level cache,
  2. on the calls made in your program among the functions executed.

If you are only interested the first item, it's enough to use Cachegrind from Valgrind. If you are only interested in the second item, use Callgrind with option "--simulate-cache=no". This will only count events of type Instruction Read Accesses. But it significantly speeds up the profiling typically by a factor of 2 or 3. If the program section you want to profile is somewhere in the middle of the run, it is benificial to fast forward to this section without any profiling at all, and switch it on later. This is achieved by using "--instr-atstart=no" and interactively use ""callgrind_control -i on" before the interesting code section is about to be executed.

2.2 Multiple dumps from one program run

Often, you aren't interested in time characteristics of a full program run, but only of a small part of it (e.g. execution of one algorithm). If there are multiple algorithms or one algorithm running with different input data, it's even useful to get different profile information for multiple parts of one program run.

The generated dump files are named

    callgrind.out.pid[.part][-threadID]
where pid is the PID of the running program, part is a number incremented on each dump (".part" is skipped for the dump at program termination), threadID is a thread identification ("-threadID" is only used if you request dumps if individual threads).

There are different ways to generate multiple profile dumps while a program is running under supervision of Callgrind. Still, all methods trigger the same action "dump all profile information since last dump or program start, and zero cost counters afterwards". To allow for zeroing cost counters without dumping, there exists a second action "zero all cost counters now". The different methods are:

  • Dump on program termination. This method is the standard way and doesn't need any special action from your side.

  • Spontaneous, interactive dumping. Use
      callgrind_control -dump [hint [PID/Name]]
    to request the dumping of profile information of the supervised application with PID or Name. hint is an arbitrary string you can optionally specify to later be able to distinguish profile dumps. The control program will not terminate before the dump is completely written. Note that the application must be actively running for detection of the dump command. So, for a GUI application, resize the window or for a server send a request.

    If you are using KCachegrind for browsing of profile information, you can use the toolbar button "Force dump". This will create the file "cachegrind.cmd" and will trigger a reload after the dump is written.

  • Periodic dumping after execution of a specified number of basic blocks. For this, use the command line option --dumps=count. The resultion of the internal basic block counter of Valgrind is only rough, so you should at least specify a interval of 50000 basic blocks.
  • Dumping at enter/leave of all functions whose name starts with funcprefix. Use option --dump-before=funcprefix and --dump-after=funcprefix). To zero cost counters before entering a function, use --zero-before=funcprefix. The prefix method for specifying function names was choosen to ease the use with C++: you don't have to specify full signatures.

    You can specify these options multiple times for different function prefixes.

  • Program controlled dumping. Put "#include <valgrind/callgrind.h>" into your source and add "CALLGRIND_DUMP_STATS;" when you want a dump to happen. Use "CALLGRIND_ZERO_STATS;" to only zero cost centers.

    In Valgrind terminology, this way is called "Client requests". The given macros generate a special instruction pattern with no effect at all (i.e. a NOP). Only when run under Valgrind, the CPU simulation engine detects the special instruction pattern and triggers special actions like the ones described above.

If you are running a multi-threaded application and specify the command line option "--dump-threads=yes", every thread will be profiled on its own and will create its own profile dump. Thus, the last two methods will only generate one dump of the currently running thread. With the other methods, you will get multiple dumps (one for each thread) on a dump request.

2.3 Limiting range of event collection

You can control for which part of your program you want to collect event costs by using --toggle-collect=funcprefix. This will toggle the collection state on entering and leaving a function. When specifying this option, the default collecting state at program start is "off". Thus, only events happing while running inside of funcprefix will be collected. Recursive function calls of funcprefix don't influence collecting at all.

2.4 Avoiding cycles

Each group of functions with any two of them happening to have a call chain from one to the other, is called a cycle. E.g. with A calling B, B calling C, and C calling A, the three functions A,B,C build up one cycle.

If a call chain goes multiple times around inside of a cycle, you can't distinguish costs coming from the first round or the second. Thus, it makes no sense to attach any cost to a call among function in one cycle: if "A > B" appears multiple times in a call chain, you have no way to partition the one big sum of all appearances of "A > B". Thus, for profile data presentation, all functions of a cycle are seen as one big virtual function.

Unfortunately, if you have an application using some callback mechanism (like any GUI program), or even with normal polymorphism (as in OO languages like C++), it's quite possible to get large cycles. As it is often impossible to say anything about performance behaviour inside of cycles, it is useful to introduce some mechanisms to avoid cycles in call graphs at all. This is done by treating the same function as different functions depending on the current execution context by giving them different names, or by ignoring calls to functions at all.

There is an option to ignore calls to a function with "--fn-skip=funcprefix". E.g., you usually don't want to see the trampoline functions in the PLT sections for calls to functions in shared libraries. You can see the difference if you profile with "--skip-plt=no". If a call is ignored, cost events happening will be attached to the enclosing function.

If you have a recursive function, you can distinguish the first 10 recursion levels by specifying "--fn-recursion10=funcprefix". Or for all functions with "fn-recursion=10", but this will give you much bigger profile dumps. In the profile data, you will see the recursion levels of "func" as the different functions with names "func", "func'2", "func'3" and so on.

If you have call chains "A > B > C" and "A > C > B" in your program, you usually get a "false" cycle "B <> C". Use "--fn-caller2=B --fn-caller2=C", and functions "B" and "C" will be treated as different functions depending on the direct caller. Using the apostrophe for appending this "context" to the function name, you get "A > B'A > C'B" and "A > C'A > B'C", and there will be no cycle. Use "--fn-callers=3" to get a 2-caller depencendy for all functions. Again, this will multiplicate the profile data size.

3. Command line option reference

--base=<prefix>

    Specify another base name for the dump file names. This defaults to "cachegrind.out". To distinguish different profile runs of the same application, there is ".<pid>" appended to the base dump file name with <pid> being the process ID of the profile run (with multiple dumps happening, the file name is modified further; see below).

    This option is especially usefull if your application changes its working directory. Usually, the dump file is generated in the current working directory of the application at program termination. By giving an absolute path with the base specification, you can force a fixed directory for the dump files.

--simulate-cache=yes|no

    Specify if you want to do full cache simulation. Default is yes. If you say no, only instruction read accesses will be profiled. This typically makes the execution at least twice as fast.

    Note however, that estimating of how much real time your program will need only by using the instruction read counts is impossible. Use it if you want to find out how many times different functions are called and there call relation.

--instr-atstart=yes|no

    Specify if you want Callgrind to start simulation and profiling from the beginning. If not, Callgrind will not be able to collect any information, including calls, but it will have at most a slowdown of around 4, which is the minimum Valgrind overhead. Instrumentation can be interactively switched on via
      callgrind_control -i on.
    Note that the resulting call graph will most probably not contain main, but all the functions executed after instrumentation was switched on. Instrumentation can also programatically switched on/off. See the Callgrind include file <callgrind.h> for the macro you have to use in your source code.

    For cache simulation, results will be a little bit off when switching on instrumentation later in the program run, as the simulator starts with an empty cache at that moment. Switch on event collection later to cope with this error.

--collect-atstart=yes|no

    Specify whether event collection is switched on at beginning of the profile run. This defaults to yes.

    To only look at parts of your program, you have two possibilities:

    • Zero event counters before entering the program part you want to profile, and dump the event counters to a file after leaving that program part.
    • Switch on/off collection state as needed to only see event counters happening while inside of the program part you want to profile.
    The second option can be used if the programm part you want to profile is called many times. Option 1, i.e. creating a lot of dumps is not practically here.

    Collection state can be toggled at entering and leaving of a given function with option --toggle-collect=<function>. For this, collection state should be switched off at the beginning. Note that the specification of --toggle-collect implicitly sets --collect-state=no.

    Collection state can be toggled also by using a Valgrind User Request in your application. For this, include valgrind/callgrind.h and specify the macro CALLGRIND_TOGGLE_COLLECT at the needed positions. This only will have any effect if run under supervision of the Callgrind tool.

--skip-plt=no|yes

    Ignore calls to/from PLT sections. Defaults to yes.
--fn-skip=<function>/code>

    Ignore calls to/from a given function? E.g. if you have a call chain A > B > C, and you specify function B to be ignored, you will only see A > C.

    This is very convenient to skip functions handling callback behaviour. E.g. for the SIGNAL/SLOT mechanism in QT, you only want to see the function emitting a signal to call the slots connected to that signal. First, determine the real call chain to see the functions needed to be skipped, then use this option.

--fn-group<number>=<function>

    Put a function into separation group number.
--fn-recursion<number>=<function>

    Separate <number> recursions for <function>
--fn-caller<number>=<function>

    Separate <number> callers for <function>
--dump-before=<function>

    Dump when entering <function>
--zero-before=<function>

    Zero all costs when entering <function>
--dump-after=<function>

    Dump when leaving <function>
--toggle-collect=<function>

    Toggle collection on enter/leave <function>
--fn-recursion=<level>

    Separate function recursions, maximal <level> [2]
--fn-caller=<callers>

    Separate functions by callers [0]
--mangle-names=no|yes

    Mangle separation into names? [yes]
--dump-threads=no|yes

    Dump traces per thread? [no]
--compress-strings=no|yes

    Compress strings in profile dump? [yes]
--dump-bbs=no|yes

    Dump basic block info? [no]. This needs an update of the KCachegrind importer!
--dumps=<count>

    Dump trace each <count> basic blocks [0=never]
--dump-instr=no|yes

    This specifies the granularity of the profile information. Note that if you dump at instruction level, ct_annotate currently is not able to show you the data. You have to use KCachegrind to get annotated disassembled code. [no]
--trace-jump=no|yes

    This specifies whether information for (conditional) jumps should be collected. Same as above, ct_annotate currently is not able to show you the data. You have to use KCachegrind to get jump arrows in the annotated code. [no]

4. Profile data file format

The header has an arbitrary number of lines of the format "key: value". Afterwards, position specifications "spec=position" and cost lines starting with a number of position columns (as given by the "positions:" header field), followed by space separated cost numbers can appear. Empty lines are always allowed.

Possible key values for the header are:

  • version: major.minor [Callgrind]
    This is used to distinguish future trace file formats. A major version of 0 or 1 is supposed to be upwards compatible with Cachegrind 1.0.x format. It is optional; if not appearing, original Cachegrind 1.0.x format is supposed. Otherwise, this has to be the first header line.
  • pid: process id [Callgrind]
    This specifies the process ID of the supervised application for which this profile was generated.
  • cmd: program name + args [Cachegrind]
    This specifies the full command line of the supervised application for which this profile was generated.
  • part: number [Callgrind]
    This specifies a sequentially incremented number for each dump generated, starting at 1.
  • desc: type: value [Cachegrind]
    This specifies various information for this dump. For some types, the semantic is defined, but any description type is allowed. Unknown types should be ignored.

    There are the types "I1 cache", "D1 cache", "L2 cache", which specify parameters used for the cache simulator. These are the only types originally used by Cachegrind. Additionally, Callgrind uses the following types: "Timerange" gives a rough range of the basic block counter, for which the cost of this dump was collected. Type "Trigger" states the reason of why this trace was generated. E.g. program termination or forced interactive dump.

  • positions: [instr] [line] [Callgrind]
    For cost lines, this defines the semantic of the first numbers. Any combination of "instr", "bb" and "line" is allowed, but has to be in this order which corresponds to position numbers at the start of the cost lines later in the file.

    If "instr" is specified, the position is the address of an instruction whose execution raised the events given later on the line. This address is relative to the offset of the binary/shared library file to not have to specify relocation info. For "line", the position is the line number of a source file, which is responsible for the events raised. Note that the mapping of "instr" and "line" positions are given by the debugging line information produced by the compiler.

    This field is optionally. If not specified, "line" is supposed only.

  • events: event type abbrevations [Cachegrind]
    A list of short names of the event types logged in this file. The order is the same as in cost lines. The first event type is the second or third number in a cost line, depending on the value of "positions". Callgrind does not add additional cost types. Specify exactly once.

    Cost types from original Cachegrind are

    • Ir
      Instruction read access
    • I1mr
      Instruction Level 1 read cache miss
    • I2mr
      Instruction Level 2 read cache miss
    • ...
  • summary: costs [Callgrind]
  • totals: costs [Cachegrind]
    The value or the total number of events covered by this trace file. Both keys have the same meaning, but the "totals:" line happens to be at the end of the file, while "summary:" appears in the header. This was added to allow postprocessing tools to know in advance to total cost. The two lines always give the same cost counts.
As said above, there also exist lines "spec=position". The values for position specifications are arbitrary strings. When starting with "(" and a digit, it's a string in compressed format. Otherwise it's the real position string. This allows for file and symbol names as position strings, as these never start with "(" + digit. The compressed format is either "(" number ")" space position or only "(" number ")". The first relates position to number in the context of the given format specification from this line to the end of the file; it makes the (number) an alias for position. Compressed format is always optional.

Position specifications allowed:

  • ob= [Callgrind]
    The ELF object where the cost of next cost lines happens.
  • fl= [Cachegrind]
  • fi= [Cachegrind]
  • fe= [Cachegrind]
    The source file including the code which is responsible for the cost of next cost lines. "fi="/"fe=" is used when the source file changes inside of a function, i.e. for inlined code.
  • fn= [Cachegrind]
    The name of the function where the cost of next cost lines happens.
  • cob= [Callgrind] The ELF object of the target of the next call cost lines.
  • cfl= [Callgrind] The source file including the code of the target of the next call cost lines.
  • cfn= [Callgrind] The name of the target function of the next call cost lines.
  • calls= [Callgrind] The number of nonrecursive calls which are responsible for the cost specified by the next call cost line. This is the cost spent inside of the called function.

    After "calls=" there MUST be a cost line. This is the cost spent in the called function. The first number is the source line from where the call happened.

  • jump=count target position [Callgrind] Unconditional jump, executed count times, to the given target position.
  • jcnd=exe.count jumpcount target position [Callgrind] Conditional jump, executed exe.count times with jumpcount jumps to the given target position.

:: Command execute ::

Enter:
 
Select:
 

:: Search ::
  - regexp 

:: Upload ::
 
[ Read-Only ]

:: Make Dir ::
 
[ Read-Only ]
:: Make File ::
 
[ Read-Only ]

:: Go Dir ::
 
:: Go File ::
 

--[ c99shell v. 1.0 pre-release build #16 powered by Captain Crunch Security Team | http://ccteam.ru | Generation time: 0.0036 ]--