Health-check

The health-check tool monitors processes in various ways to help identify areas where it is consuming too many resources. One can trace one or more processes (including all their threads and child processes too) for a full story of system activity.

Health-check can monitor: ..and can also dump the stats into a JSON formatted file for later analysis.

Health-check command line options:
Currently just x86 (32 and 64 bit) and ARM are supported.

Health check can be used to either attach to one or more existing running processes using the -p option, or one can specify a command to be run and it will executed and traced. The latter option also allows one to specify the user id to run the command under.

Example Output:
sudo health-check -p camera-app -c -d 60 
CPU usage:
  PID  Process                USR%   SYS%  TOTAL%
  3585 camera-app            18.97  16.93  35.90 (medium load)

Wakeups:
  PID  Process               Wake/Sec Kernel Functions
  3608 camera-app               13.62 (add_timer, OSTimerCallbackWrapper) (high)
  3589 camera-app                3.17 (hrtimer_start_range_ns, hrtimer_wakeup) (moderate)
  3608 camera-app                0.37 (schedule_timeout_uninterruptible, process_timeout) (low)
  3585 camera-app                0.28 (hrtimer_start_range_ns, hrtimer_wakeup) (low)
 Total                          17.43

Context Switches:
  PID  Process                Voluntary   Involuntary     Total
                             Ctxt Sw/Sec  Ctxt Sw/Sec  Ctxt Sw/Sec
  3608 camera-app                1712.07       178.02      1890.09 (high)
  3585 camera-app                1498.67        10.72      1509.39 (high)
  3606 camera-app                 143.25        18.55       161.80 (quite high)
  3667 camera-app                 142.33        18.68       161.02 (quite high)
  3605 camera-app                 141.25        18.32       159.57 (quite high)
  3628 camera-app                 128.12        16.50       144.62 (quite high)
  3587 camera-app                 108.98         0.05       109.03 (quite high)
  3661 camera-app                   6.48        46.30        52.78 (moderate)
  3607 camera-app                  27.75         1.65        29.40 (moderate)
  3589 camera-app                  20.48         0.02        20.50 (moderate)
  3592 camera-app                   0.00         0.00         0.00 (idle)
  3590 camera-app                   0.00         0.00         0.00 (idle)
  3588 camera-app                   0.00         0.00         0.00 (idle)
 Total                           4238.19       308.80      3929.39

File I/O operations:
  PID  Process               Count  Op  Filename
  3585 camera-app               43    O /home/phablet/.config/user-dirs.dirs
  3585 camera-app               41   CW /home/phablet/Pictures/image20130905_0004.jpg
  3585 camera-app               22    W /tmp/Camera App.nS3585
  3585 camera-app               21    C /home/phablet/.config/user-dirs.dirs
  3585 camera-app               21    R /home/phablet/.config/user-dirs.dirs
  3585 camera-app                1  OCR /home/phablet/.config/user-dirs.dirs
 Total                         149

File I/O Operations per second:
  PID  Process                 Open   Close    Read   Write
  3585 camera-app              0.73    1.05    0.37    1.05

System calls traced:
  PID  Process              Syscall               Count    Rate/Sec
  3608 camera-app           ioctl                 25035     417.2470
  3585 camera-app           futex                 13713     228.5484
  3608 camera-app           futex                 12382     206.3652
  3585 camera-app           clock_gettime          9065     151.0823
  3585 camera-app           poll                   6098     101.6326
  3585 camera-app           read                   5929      98.8160
  3608 camera-app           clock_gettime          2949      49.1497
  3608 camera-app           mmap2                  2389      39.8164
  3628 camera-app           ioctl                  1532      25.5332
  3608 camera-app           poll                   1421      23.6832
  3608 camera-app           munmap                 1022      17.0332
  3587 camera-app           epoll_wait              927      15.4499
  3587 camera-app           recvfrom                926      15.4332
  3585 camera-app           stat64                  682      11.3666
  3628 camera-app           getpriority             409       6.8166
  3585 camera-app           recvmsg                 396       6.6000
  3585 camera-app           gettimeofday            319       5.3166
  3628 camera-app           munmap                  279       4.6500
  3585 camera-app           sendmsg                 242       4.0333
  3589 camera-app           futex                   231       3.8500
  3589 camera-app           nanosleep               190       3.1666
  3628 camera-app           clock_gettime           155       2.5833
  3585 camera-app           ioctl                    96       1.6000
  3585 camera-app           fstat64                  88       1.4667
  3628 camera-app           futex                    84       1.4000
  3608 camera-app           read                     82       1.3667
  3608 camera-app           open                     44       0.7333
  3661 camera-app           futex                    30       0.5000
  3585 camera-app           munmap                   24       0.4000
  3661 camera-app           poll                     23       0.3833
  3661 camera-app           read                     23       0.3833
  3661 camera-app           clock_gettime            23       0.3833
  3585 camera-app           fcntl64                  22       0.3667
  3608 camera-app           writev                   22       0.3667
  3585 camera-app           access                   22       0.3667
  3585 camera-app           open                     22       0.3667
  3661 camera-app           mmap2                    22       0.3667
  3585 camera-app           fork                     21       0.3500
  3587 camera-app           mprotect                 18       0.3000
  3587 camera-app           futex                    12       0.2000
  3628 camera-app           dup                      10       0.1667
  3628 camera-app           _llseek                   5       0.0833
  3628 camera-app           fcntl64                   5       0.0833
  3628 camera-app           stat64                    5       0.0833
  3628 camera-app           mmap2                     5       0.0833
  3628 camera-app           fstat64                   5       0.0833
  3628 camera-app           lstat64                   5       0.0833
  3628 camera-app           rename                    5       0.0833
  3628 camera-app           open                      5       0.0833
  3628 camera-app           unlink                    5       0.0833
  3661 camera-app           munmap                    1       0.0167
  3661 camera-app           restart_syscall           1       0.0167
  3592 camera-app           restart_syscall           1       0.0167
  3585 camera-app           restart_syscall           1       0.0167
 Total                                            87028    1450.4563

Top polling system calls:
  PID  Process              Syscall             Rate/Sec   Infinite   Zero     Minimum    Maximum    Average
                                                           Timeouts Timeouts   Timeout    Timeout    Timeout
  3585 camera-app           poll                  101.6326     2974     1853   0.0 sec   25.0 sec  788.5 msec
  3608 camera-app           poll                   23.6832        0     1421   0.0 sec    0.0 sec    0.0 sec 
  3587 camera-app           epoll_wait             15.4499        0        0   5.0 sec    5.0 sec    5.0 sec 
  3589 camera-app           nanosleep               3.1666        0        0 900.0 usec 900.0 usec 900.0 usec
  3661 camera-app           poll                    0.3833       22        1   0.0 sec    0.0 sec    0.0 sec 
 Total                                            144.3156     2996     3275

Distribution of poll timeout times:
                                                            10.0  100.0    1.0   10.0  100.0    1.0   10.0  100.0
                                                    up to    to     to     to     to     to     to     to  or more
                                              Zero    9.9   99.9  999.9    9.9   99.9  999.9    9.9   99.9        Infinite
  PID  Process              Syscall            sec   usec   usec   usec   msec   msec   msec    sec    sec    sec   Wait
  3585 camera-app           poll              1853     -      -      -      29    178    864     24    176     -    2974
  3608 camera-app           poll              1421     -      -      -      -      -      -      -      -      -       0
  3587 camera-app           epoll_wait           0     -      -      -      -      -      -     927     -      -       0
  3589 camera-app           nanosleep            0     -      -     190     -      -      -      -      -      -       0
  3661 camera-app           poll                 1     -      -      -      -      -      -      -      -      -      22

Polling system call analysis:
 camera-app (3585), poll:
          2 immediate timed out calls with zero timeout (non-blocking peeks)
 camera-app (3608), poll:
       1339 immediate timed out calls with zero timeout (non-blocking peeks)
       1297 repeated immediate timed out polled calls with zero timeouts (heavy polling peeks)

Per Process Memory (K):
  PID  Process              Type        Size       RSS       PSS
  3585 camera-app           Stack        136        64        64
  3585 camera-app           Heap      130440     11476     11476
  3585 camera-app           Mapped    131308     44688     24811

Change in memory (K/second):
  PID  Process              Type        Size       RSS       PSS
  3585 camera-app           Heap        0.00      1.80      1.80 (growing)
  3585 camera-app           Mapped      0.00      0.40      0.37 (growing slowly)

Open Network Connections:
 None.
Another example:
 sudo health-check -u king dd if=/dev/zero of=test bs=1k count=100000
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 7.85756 s, 13.0 MB/s
CPU usage (in terms of 1 CPU):
  PID  Process                USR%   SYS% TOTAL%   Duration
 22572 dd                     3.17  56.53  59.69       7.89  (high load)

Page Faults:
  PID  Process                 Minor/sec    Major/sec    Total/sec
 22572 dd                          32.83         0.00        32.83

Wakeups:
 No wakeups detected.

Context Switches:
  PID  Process                Voluntary   Involuntary     Total
                             Ctxt Sw/Sec  Ctxt Sw/Sec  Ctxt Sw/Sec
 22572 dd                       50720.29        45.25     50765.54 (very high)

File I/O operations:
  PID  Process               Count  Op  Filename
 22572 dd                    96985    W /home/king/repos/health-check/test
 22572 dd                       14    C /lib/x86_64-linux-gnu/libnss_files-2.17.so
 22572 dd                        6    O /etc/ld.so.cache
 22572 dd                        5   OR /bin/dd
 22572 dd                        1    R /etc/locale.alias
 22572 dd                        1   CW /home/king/repos/health-check/test
 Total                       97012
 Op: O=Open, R=Read, W=Write, C=Close

File I/O Operations per second:
  PID  Process                 Open   Close    Read   Write
 22572 dd                      1.39    1.90    0.76 12292.09

System calls traced:
  PID  Process              Syscall               Count    Rate/Sec
 22572 dd                   read                 100005   12674.7186
 22572 dd                   write                100003   12674.4651
 22572 dd                   mmap                     17       2.1546
 22572 dd                   close                    13       1.6476
 22572 dd                   open                     12       1.5209
 22572 dd                   fstat                     8       1.0139
 22572 dd                   mprotect                  8       1.0139
 22572 dd                   rt_sigaction              6       0.7604
 22572 dd                   access                    5       0.6337
 22572 dd                   brk                       3       0.3802
 22572 dd                   dup2                      2       0.2535
 22572 dd                   munmap                    2       0.2535
 22572 dd                   arch_prctl                1       0.1267
 22572 dd                   exit_group                1       0.1267
 22572 dd                   set_tid_address           1       0.1267
 22572 dd                   futex                     1       0.1267
 22572 dd                   execve                    1       0.1267
 22572 dd                   lseek                     1       0.1267
 22572 dd                   rt_sigprocmask            1       0.1267
 22572 dd                   getrlimit                 1       0.1267
 22572 dd                   set_robust_list           1       0.1267
 Total                                           200093   25359.9567

Filesystem Syncs:
 None.

Per Process Memory (K):
  PID  Process              Type        Size       RSS       PSS
 22572 dd                   Stack        136        20        20
 22572 dd                   Heap          60        44        44
 22572 dd                   Mapped     15500       876       158

Change in memory (K/second):
  PID  Process              Type        Size       RSS       PSS
 22572 dd                   Stack       0.00      1.52      1.52 (growing)
 22572 dd                   Heap        7.60      5.58      5.58 (growing moderately fast)
 22572 dd                   Mapped   1936.09    110.01     19.01 (growing very fast)

Heap Change via brk():
  PID                        brk Count  Change (K)  Rate (K/second)
 22572 dd                            3        29252       3707.42 (growing very fast)

Memory Change via mmap() and munmap():
  PID                          mmaps  munmaps   Change (K)  Rate (K/second)
 22572 dd                         17        2       15263       1934.46 (growing fast)

Open Network Connections:
 None.
Source and packages: Tracing Overhead:

health-check uses ptrace to monitor system calls with a single ptrace thread monitoring each thread of the programme being executed. This will slow down system call intensive applications. Applications that do infrequent system calls the overhead is very low. Instrumenting a tight loop running zero timeout poll() system calls, health-check system call tracing slows the program down by a factor of 125 (on ARMv7) and 95 (on Intel i5).
Running health-check while playing SD video via a flash plugin in firefox while tracing 63 threads, health check doubled the CPU consumption in an Intel i5.

Author: colin.king@canonical.com

Coverity Scan Build Status