Musings of the lazy developer: linux

Showing posts with label linux. Show all posts

Friday, January 3, 2014

Sleeping for a negative interval in perl

Stumbled on this behavior accidentally when walking through some buggy Perl code. The gist is:

> perl -e 'sleep(1);print "done\n"'
done
> perl -e 'sleep(-1);print "done\n"'

... never completes!

I tried to trove through the perldoc and the underlying linux man page for sleep but couldn't find an explicit explanation. I guess because the underlying sleep method in unistd.h accepts unsigned int, perl's negative 1 is interpreted as 32-bit unsigned integer which is 2^31 - waiting for 2^31 seconds - which is a loong time making us think that it is never completing. To put this theory to test, we tried out something like:

> perl -e 'sleep(-0.0000000000000000000000000000001);print "done\n"'
done

and it completed! Hope this helps someone else as well!

EDIT:

Looks like the fractional example is not a good one - looking through Perl's source code, it plainly seems that sleep is modeled as a macro:

#define PerlProc_sleep(t) sleep((t))

So the floating point expression was being plainly truncated. A better test case would be 2^32 - which can't be fit in a 32 bit signed integer (the default chosen) and when overflowing would be interpreted as 1 and so on.

> date ; perl -e 'sleep(4294967296);print "done\n"' ; date
Mon Jan  6 02:12:27 EST 2014
done
Mon Jan  6 02:12:27 EST 2014
> date ; perl -e 'sleep(4294967297);print "done\n"' ; date
Mon Jan  6 02:12:34 EST 2014
done
Mon Jan  6 02:12:35 EST 2014
> date ; perl -e 'sleep(4294967298);print "done\n"' ; date
Mon Jan  6 02:12:42 EST 2014
done
Mon Jan  6 02:12:44 EST 2014

EDIT 2:

A colleague of mine pointed out a couple of things. A more straight-forward way of confirming the hypothesis is via strace. Firstly, we need to know the underlying Linux system call that is being invoked by the Perl runtime. strace -v on the perl one-liner of sleep(1) will help you identify that it is nanosleep. strace -v -e nanosleep on the perl one-line of sleep(-1) will show that it will indeed try to sleep for 4294967295 seconds. Secondly, Time::HiRes module of Perl, is a better alternative for such sleeps, because it catches these negative usages of sleep interval with a cocky "negative time not invented yet" croak.

References:

Wednesday, June 12, 2013

Know your disk latency numbers

I was looking around for a tool that will help quantify the latency difference between a NFS-mounted location to a local location and came across ioping. As the documentation indicates, it tries to represent disk latency in pretty much the same way as the ping command shows network latency to a desired host.

I had to build it from source for our linux installation - but the process went about without a hitch. I first tried it on my NFS mounted home drive where the checked-out code, eclipse workspace and maven repository reside and is the hotspot for IO activity when developing. Using a sample invocation from the documentation site, I tried it for a chunk of size 1MB, ten times.

$>ioping -c 10 -s 1M
--- /home/kilo(nfs fs1.kilo.com:/vol/home/kilo) ioping statistics ---
10 requests completed in 9101.5 ms, 102 iops, 102.1 mb/s
min/avg/max/mdev = 9.5/9.8/10.0/0.2 ms

Next was to test out the performance of our local partition with the same parameters:

--- /local/kilo (ext4 /dev/ssda) ioping statistics ---
10 requests completed in 9052.4 ms, 201 iops, 200.8 mb/s
min/avg/max/mdev = 4.8/5.0/5.6/0.2 ms

Result was roughly a speed-up of 200%!

For kicks, I tried it out on a /tmp location as well with the same parameters:

--- /tmp (tmpfs none) ioping statistics ---
10 requests completed in 9004.1 ms, 5219 iops, 5219.2 mb/s
min/avg/max/mdev = 0.1/0.2/0.3/0.0 ms

That was roughly 30 times faster - but hold on - what is that tmpfs that is mentioned. Digging further, it seems tmpfs is a special filesystem that doesn't reside on a physical disk but rather stored in the physical memory (albeit volatile) - so the speed-up in access is kind of expected. Which also means that we should be extra nice about what goes in /tmp in such a setup. Keeping a lot of garbage in there and not cleaning up will come and bite us at unpredictable times. Guess many people already know about this

Coming back to the utility itself, there are options to change working set sizes, offsets into files that wish to be seeked, etc. One particularly interesting feature is the option to use write IO, but that option was not available in the 0.6 that I downloaded. An issue indicates that this is a yet to be released feature and will be interesting to see it in action.

I think this will a good utility to have in our linux installations by default. If there are other utilities that do similar stuff but comes out of the box from a stock RHEL installation, please let me know. Hope this helps!

References:

1. https://code.google.com/p/ioping/

2. http://en.wikipedia.org/wiki/Tmpfs

3. https://code.google.com/p/ioping/wiki/man

Monday, October 29, 2012

Classpath of a running java process spawned via maven exec:java

Ran into a peculiar situation where I was using the maven exec:java plugin to launch a java process. The plugin does the heavy lifting of creating the correct classpath comprising of its dependencies (much like what JUnit does). In this case, I suspected the java and maven to be conspiring against me by picking up a staler jar. In the case of Tomcat (or other containers) it is usually easy to look at WEB-INF/lib for the jars that will be used. But in this case, the command line didn't yield much information.

> ps -ef | grep produser| grep java
munshi     953 27405  0 01:21 pts/0    00:00:00 grep java
munshi   31771 31758 11 01:14 pts/0    00:00:50 /usr/local/java/jdk/bin/java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8765,suspend=n -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8219 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath /tools/apache/maven/boot/plexus-classworlds-2.4.jar -Dclassworlds.conf=/tools/apache/maven/bin/m2.conf -Dmaven.home=/tools/apache/maven org.codehaus.plexus.classworlds.launcher.Launcher --settings /u/produser/.m2/settings.xml clean compile exec:java -Dexec.mainClass=com.kilo.DriverCLI -Dexec.args=SOMETHING

The -classpath /tools/maven/boot/plexus-classworlds-2.4.jar didn't help much here. In JUnit, a whole file is created with the list of jars being used and it is very easy to inspect. There are intrusive ways to print the classpath for a application by either 1. setting -verbose:class when starting the process or by explicitly requesting in code with the system property "java.class.path". I fired up JVisualVM to see if it can help me examine the system property non-instrusively (since I really didn't want to restart the application for investigative reasons -- just yet). Again it showed the /tools/maven/boot/plexus-classworlds-2.4.jar as the value. That's when I came across this alternate way where you use the lsof command to see which files are open. Since the classloader has to refer to the JAR from where the class is being loaded, it shows up in the list of open files - however not sure if this is a sureshot way to get things done - what if the JVM decides to close off the connection to the JAR as a way of reducing number of open file descriptors.

> lsof -p 31771 | less
COMMAND   PID   USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
java    31771 produser  cwd    DIR               0,70     4096 13085795 /u/produser/testproject (filer.kilo.com:/vol/proj1/proj-302)
java    31771 produser  rtd    DIR              253,0     4096        2 /
java    31771 produser    txt    REG              253,0     7630  1465254 /usr/local/java/jdk1.7.0.04/bin/java
java    31771 produser  mem    REG              253,0   156872  1441803 /lib64/ld-2.12.so
java    31771 produser  mem    REG              253,0  1922112  1441807 /lib64/libc-2.12.so
java    31771 produser  mem    REG              253,0    22536  1444029 /lib64/libdl-2.12.so
java    31771 produser  mem    REG              253,0   145720  1441859 /lib64/libpthread-2.12.so
java    31771 produser  mem    REG              253,0   598800  1444297 /lib64/libm-2.12.so
java    31771 produser  mem    REG              253,0    47064  1441866 /lib64/librt-2.12.so
java    31771 produser  mem    REG              253,0   113952  1444031 /lib64/libresolv-2.12.so
java    31771 produser  mem    REG              253,0   124624  1444032 /lib64/libselinux.so.1
java    31771 produser  mem    REG              253,0    17256  1444294 /lib64/libcom_err.so.2.1
java    31771 produser  mem    REG              253,0    12592  1444030 /lib64/libkeyutils.so.1.3
java    31771 produser    mem    REG              253,0   915104  1444295 /lib64/libkrb5.so.3.3
java    31771 produser  mem    REG              253,0    43304  1444292 /lib64/libkrb5support.so.0.1
java    31771 produser  mem    REG              253,0   181608  1444293 /lib64/libk5crypto.so.3.1
java    31771 produser  mem    REG              253,0   268944  1444296 /lib64/libgssapi_krb5.so.2.2
java    31771 produser  mem    REG              253,0     6398  1361340 /usr/local/java/jdk1.7.0.04/jre/lib/amd64/librmi.so
java    31771 produser  mem    REG              253,0  1023488  1361282 /usr/local/java/jdk1.7.0.04/jre/lib/ext/localedata.jar
java    31771 produser  mem    REG              253,0  2476995  1361287 /usr/local/java/jdk1.7.0.04/jre/lib/resources.jar
java    31771 produser  mem    REG              253,0     8934  1361283 /usr/local/java/jdk1.7.0.04/jre/lib/ext/dnsns.jar
java    31771 produser  mem    REG               0,71  1501575  1312261 /u/produser/.m2/repository/com/google/guava/guava/10.0.1/guava-10.0.1.jar (filer.kilo.com:/vol/home2/produser)

References:

1. http://thilinamb.wordpress.com/2009/07/01/analyse-the-classpath-of-a-running-java-program/

Wednesday, October 3, 2012

Details of long running and long-forgotten processes

Sometimes we start off a new process and after quite sometime you wish to know details as to how the process was started. Things like current working directory, complete command line, environment variables used. This usually happens for the programs kicked off using a sudo-user and then you mostly quit the shell from where this was invoked. In my specific case, I was looking for the current working directory (to see where it will dump the core) and environment variables in use. In Linux, this is available in /proc/<pid>/ files.

I started browsing other entries and was amazed at the amount of useful information that is being transparently provided by Linux. Some of the notable ones are limits, statm and task. The reference link has details for other interesting details.

Here is the story:

I remember that I started the process on a particular host as the produser so log on there

> sudo -u produser ssh localhost

I remember that it is a java process - to get a handle to the process id.

> ps -ef | grep java | grep produser 
produser 8113  7983  0 02:14 pts/10   00:00:00 grep java
produser 21765 21750  0 Aug31 ?        01:46:45 /usr/local/java/jdk/bin/java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8765,suspend=n -Dcom.sun.management.jmxremote.port=8219 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath /tools/apache/maven/boot/classworlds-1.1.jar -Dclassworlds.conf=/tools/apache/maven/bin/m2.conf -Dmaven.home=/tools/apache/maven org.codehaus.classworlds.Launcher "--settings" "/u/produser/.m2/settings.xml" "clean" "compile" "exec:java" "-Dexec.mainClass=com.kilo.DriverCLI" "-Dexec.args=SOMETHING"

Now check the current working directory:

> ls -al /proc/21765/cwd
lrwxrwxrwx 1 produser produser 0 Oct  1 08:04 /proc/21765/cwd -> /u/produser/testproject

Now for the environment settings:

> cat /proc/21765/environ
#All environment variables were listed - elided for privacy reasons

Other interesting info:

> cat /proc/21765/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             16341                16341                processes
Max open files            65536                65536                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       256591               256591               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

> cat /proc/21765/task/21771/status
Name: java
State: S (sleeping)
Tgid: 21765
Pid: 21771
PPid: 21750
TracerPid: 0
Uid: 5307 5307 5307 5307
Gid: 5307 5307 5307 5307
Utrace: 0
FDSize: 256
Groups: 5307 6135 6584
VmPeak: 11245832 kB
VmSize: 11243768 kB
VmLck:        0 kB
VmHWM:  3575560 kB
VmRSS:  2966532 kB
VmData: 11064932 kB
VmStk:       88 kB
VmExe:        4 kB
VmLib:    15324 kB
VmPTE:     6360 kB
VmSwap:        0 kB
Threads: 31
SigQ: 0/256591
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000004
SigIgn: 0000000000000000
SigCgt: 2000000181005ccf
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: ff
Cpus_allowed_list: 0-7
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 8189
nonvoluntary_ctxt_switches: 933

References:

1. http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html

Thursday, September 13, 2012

Checking stdout and stderr for a process already running

Did you ever run into a scenario where you ran a job (either a perl process or a java process) and forgot to redirect its stdout and stderr to some files and now your term is also gone. After sometime, you think that the process is not working and you are not able to see the error messages being generated. Without really resorting to snooping techniques like ttysnoop or xwindow solutions, I came across a pretty neat way of intercepting those calls (tested in Linux)

strace -ewrite -p <pid>

The strace command traces all the system calls and the signals, and if you start grepping on the write calls. Turned out to be quite useful in a crunch scenario, but one should always think about where the stdout and stderr should be redirected.

Hope this helps!

References:

http://www.linuxquestions.org/questions/linux-general-1/want-to-see-process-output-524033/

http://linux.die.net/man/1/strace

http://www.linuxhelp.net/guides/ttysnoop/

http://www.ouah.org/xsolaris.htm

Musings of the lazy developer