Health-check
The health-check tool monitors processes in various ways to help identify
areas where it is consuming too many resources. One can trace one or more
processes (including all their threads and child processes too) for a full
story of system activity.
Health-check can monitor:
- CPU usage
- Kernel wake-up events
- File I/O activity (open,read,write,close)
- System call activity
- Excessive polling of timeout wait blocked system calls (such as poll, select, etc)
- Memory usage (such as heap and stack growth)
- Network connections (to spot rogue internet activity)
- Network usage (send/receive) accounting
- Syncing data via fsync, fdatasync, syncfs and sync system calls
- Page fault accounting
..and can also dump the stats into a JSON formatted file for later analysis.
Health-check command line options:
- -b brief (simple mode) output
- -c trace all child processes
- -d duration of the test in seconds
- -f follow fork/vfork/clone system calls
- -h show help information
- -p comma separated list of process IDs or process names to trace
- -m maximum number of system calls to trace before stopping (default 1 million)
- -o JSON output file
- -r resolve IP addresses (may take a while to do)
- -u run a command to trace as a specified user
- -v verbose output
- -w monitor wakelock counts (lightweight fnotify monitoring)
- -W monitor wakelock usage (expensive syscall inspection overhead)
Currently just x86 (32 and 64 bit) and ARM are supported.
Health check can be used to either attach to one or more existing running processes using the -p option, or
one can specify a command to be run and it will executed and traced. The latter option also allows one to
specify the user id to run the command under.
Example Output:
sudo health-check -p camera-app -c -d 60
CPU usage:
PID Process USR% SYS% TOTAL%
3585 camera-app 18.97 16.93 35.90 (medium load)
Wakeups:
PID Process Wake/Sec Kernel Functions
3608 camera-app 13.62 (add_timer, OSTimerCallbackWrapper) (high)
3589 camera-app 3.17 (hrtimer_start_range_ns, hrtimer_wakeup) (moderate)
3608 camera-app 0.37 (schedule_timeout_uninterruptible, process_timeout) (low)
3585 camera-app 0.28 (hrtimer_start_range_ns, hrtimer_wakeup) (low)
Total 17.43
Context Switches:
PID Process Voluntary Involuntary Total
Ctxt Sw/Sec Ctxt Sw/Sec Ctxt Sw/Sec
3608 camera-app 1712.07 178.02 1890.09 (high)
3585 camera-app 1498.67 10.72 1509.39 (high)
3606 camera-app 143.25 18.55 161.80 (quite high)
3667 camera-app 142.33 18.68 161.02 (quite high)
3605 camera-app 141.25 18.32 159.57 (quite high)
3628 camera-app 128.12 16.50 144.62 (quite high)
3587 camera-app 108.98 0.05 109.03 (quite high)
3661 camera-app 6.48 46.30 52.78 (moderate)
3607 camera-app 27.75 1.65 29.40 (moderate)
3589 camera-app 20.48 0.02 20.50 (moderate)
3592 camera-app 0.00 0.00 0.00 (idle)
3590 camera-app 0.00 0.00 0.00 (idle)
3588 camera-app 0.00 0.00 0.00 (idle)
Total 4238.19 308.80 3929.39
File I/O operations:
PID Process Count Op Filename
3585 camera-app 43 O /home/phablet/.config/user-dirs.dirs
3585 camera-app 41 CW /home/phablet/Pictures/image20130905_0004.jpg
3585 camera-app 22 W /tmp/Camera App.nS3585
3585 camera-app 21 C /home/phablet/.config/user-dirs.dirs
3585 camera-app 21 R /home/phablet/.config/user-dirs.dirs
3585 camera-app 1 OCR /home/phablet/.config/user-dirs.dirs
Total 149
File I/O Operations per second:
PID Process Open Close Read Write
3585 camera-app 0.73 1.05 0.37 1.05
System calls traced:
PID Process Syscall Count Rate/Sec
3608 camera-app ioctl 25035 417.2470
3585 camera-app futex 13713 228.5484
3608 camera-app futex 12382 206.3652
3585 camera-app clock_gettime 9065 151.0823
3585 camera-app poll 6098 101.6326
3585 camera-app read 5929 98.8160
3608 camera-app clock_gettime 2949 49.1497
3608 camera-app mmap2 2389 39.8164
3628 camera-app ioctl 1532 25.5332
3608 camera-app poll 1421 23.6832
3608 camera-app munmap 1022 17.0332
3587 camera-app epoll_wait 927 15.4499
3587 camera-app recvfrom 926 15.4332
3585 camera-app stat64 682 11.3666
3628 camera-app getpriority 409 6.8166
3585 camera-app recvmsg 396 6.6000
3585 camera-app gettimeofday 319 5.3166
3628 camera-app munmap 279 4.6500
3585 camera-app sendmsg 242 4.0333
3589 camera-app futex 231 3.8500
3589 camera-app nanosleep 190 3.1666
3628 camera-app clock_gettime 155 2.5833
3585 camera-app ioctl 96 1.6000
3585 camera-app fstat64 88 1.4667
3628 camera-app futex 84 1.4000
3608 camera-app read 82 1.3667
3608 camera-app open 44 0.7333
3661 camera-app futex 30 0.5000
3585 camera-app munmap 24 0.4000
3661 camera-app poll 23 0.3833
3661 camera-app read 23 0.3833
3661 camera-app clock_gettime 23 0.3833
3585 camera-app fcntl64 22 0.3667
3608 camera-app writev 22 0.3667
3585 camera-app access 22 0.3667
3585 camera-app open 22 0.3667
3661 camera-app mmap2 22 0.3667
3585 camera-app fork 21 0.3500
3587 camera-app mprotect 18 0.3000
3587 camera-app futex 12 0.2000
3628 camera-app dup 10 0.1667
3628 camera-app _llseek 5 0.0833
3628 camera-app fcntl64 5 0.0833
3628 camera-app stat64 5 0.0833
3628 camera-app mmap2 5 0.0833
3628 camera-app fstat64 5 0.0833
3628 camera-app lstat64 5 0.0833
3628 camera-app rename 5 0.0833
3628 camera-app open 5 0.0833
3628 camera-app unlink 5 0.0833
3661 camera-app munmap 1 0.0167
3661 camera-app restart_syscall 1 0.0167
3592 camera-app restart_syscall 1 0.0167
3585 camera-app restart_syscall 1 0.0167
Total 87028 1450.4563
Top polling system calls:
PID Process Syscall Rate/Sec Infinite Zero Minimum Maximum Average
Timeouts Timeouts Timeout Timeout Timeout
3585 camera-app poll 101.6326 2974 1853 0.0 sec 25.0 sec 788.5 msec
3608 camera-app poll 23.6832 0 1421 0.0 sec 0.0 sec 0.0 sec
3587 camera-app epoll_wait 15.4499 0 0 5.0 sec 5.0 sec 5.0 sec
3589 camera-app nanosleep 3.1666 0 0 900.0 usec 900.0 usec 900.0 usec
3661 camera-app poll 0.3833 22 1 0.0 sec 0.0 sec 0.0 sec
Total 144.3156 2996 3275
Distribution of poll timeout times:
10.0 100.0 1.0 10.0 100.0 1.0 10.0 100.0
up to to to to to to to to or more
Zero 9.9 99.9 999.9 9.9 99.9 999.9 9.9 99.9 Infinite
PID Process Syscall sec usec usec usec msec msec msec sec sec sec Wait
3585 camera-app poll 1853 - - - 29 178 864 24 176 - 2974
3608 camera-app poll 1421 - - - - - - - - - 0
3587 camera-app epoll_wait 0 - - - - - - 927 - - 0
3589 camera-app nanosleep 0 - - 190 - - - - - - 0
3661 camera-app poll 1 - - - - - - - - - 22
Polling system call analysis:
camera-app (3585), poll:
2 immediate timed out calls with zero timeout (non-blocking peeks)
camera-app (3608), poll:
1339 immediate timed out calls with zero timeout (non-blocking peeks)
1297 repeated immediate timed out polled calls with zero timeouts (heavy polling peeks)
Per Process Memory (K):
PID Process Type Size RSS PSS
3585 camera-app Stack 136 64 64
3585 camera-app Heap 130440 11476 11476
3585 camera-app Mapped 131308 44688 24811
Change in memory (K/second):
PID Process Type Size RSS PSS
3585 camera-app Heap 0.00 1.80 1.80 (growing)
3585 camera-app Mapped 0.00 0.40 0.37 (growing slowly)
Open Network Connections:
None.
Another example:
sudo health-check -u king dd if=/dev/zero of=test bs=1k count=100000
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 7.85756 s, 13.0 MB/s
CPU usage (in terms of 1 CPU):
PID Process USR% SYS% TOTAL% Duration
22572 dd 3.17 56.53 59.69 7.89 (high load)
Page Faults:
PID Process Minor/sec Major/sec Total/sec
22572 dd 32.83 0.00 32.83
Wakeups:
No wakeups detected.
Context Switches:
PID Process Voluntary Involuntary Total
Ctxt Sw/Sec Ctxt Sw/Sec Ctxt Sw/Sec
22572 dd 50720.29 45.25 50765.54 (very high)
File I/O operations:
PID Process Count Op Filename
22572 dd 96985 W /home/king/repos/health-check/test
22572 dd 14 C /lib/x86_64-linux-gnu/libnss_files-2.17.so
22572 dd 6 O /etc/ld.so.cache
22572 dd 5 OR /bin/dd
22572 dd 1 R /etc/locale.alias
22572 dd 1 CW /home/king/repos/health-check/test
Total 97012
Op: O=Open, R=Read, W=Write, C=Close
File I/O Operations per second:
PID Process Open Close Read Write
22572 dd 1.39 1.90 0.76 12292.09
System calls traced:
PID Process Syscall Count Rate/Sec
22572 dd read 100005 12674.7186
22572 dd write 100003 12674.4651
22572 dd mmap 17 2.1546
22572 dd close 13 1.6476
22572 dd open 12 1.5209
22572 dd fstat 8 1.0139
22572 dd mprotect 8 1.0139
22572 dd rt_sigaction 6 0.7604
22572 dd access 5 0.6337
22572 dd brk 3 0.3802
22572 dd dup2 2 0.2535
22572 dd munmap 2 0.2535
22572 dd arch_prctl 1 0.1267
22572 dd exit_group 1 0.1267
22572 dd set_tid_address 1 0.1267
22572 dd futex 1 0.1267
22572 dd execve 1 0.1267
22572 dd lseek 1 0.1267
22572 dd rt_sigprocmask 1 0.1267
22572 dd getrlimit 1 0.1267
22572 dd set_robust_list 1 0.1267
Total 200093 25359.9567
Filesystem Syncs:
None.
Per Process Memory (K):
PID Process Type Size RSS PSS
22572 dd Stack 136 20 20
22572 dd Heap 60 44 44
22572 dd Mapped 15500 876 158
Change in memory (K/second):
PID Process Type Size RSS PSS
22572 dd Stack 0.00 1.52 1.52 (growing)
22572 dd Heap 7.60 5.58 5.58 (growing moderately fast)
22572 dd Mapped 1936.09 110.01 19.01 (growing very fast)
Heap Change via brk():
PID brk Count Change (K) Rate (K/second)
22572 dd 3 29252 3707.42 (growing very fast)
Memory Change via mmap() and munmap():
PID mmaps munmaps Change (K) Rate (K/second)
22572 dd 17 2 15263 1934.46 (growing fast)
Open Network Connections:
None.
Source and packages:
Tracing Overhead:
health-check uses ptrace to monitor system calls with a single ptrace thread monitoring each thread of the programme being executed. This will slow down system call intensive applications. Applications that do infrequent system calls the overhead is very low. Instrumenting a tight loop running zero timeout poll() system calls, health-check system call tracing slows the program down by a factor of 125 (on ARMv7) and 95 (on Intel i5).
Running health-check while playing SD video via a flash plugin in firefox while tracing 63 threads, health check doubled the CPU consumption in an Intel i5.
Author: colin.king@canonical.com