Binary Buffers and Python Array Performance

Pre-allocate and Re-use for Performance

shipping containerWorking with binary packed data is typically used for high performance situations or passing data in/out of extensions. You can optimize by avoiding the overhead of allocating a new python array for each structure. The pack_into() and unpack_from() methods allow you to write to pre-allocated ctypes string buffers directly.

Here’s a common example of packing that I’ve fleshed out to include a few extra data types. You should probably use ctypes for the container whenever performance is critical, but the difference between python arrays and ctypes string_buffer doesn’t seem that bad for most cases.

#!/usr/bin/python -Ott
import array
import ctypes
import struct
import binascii

s = struct.Struct('c ? I f s')
values = ('z', True, 4, 2.54, 'word')
print 'ctypes.string_buffer:'

b = ctypes.create_string_buffer(s.size)
print 'Before packing  :', binascii.hexlify(b.raw)

s.pack_into(b, 0, *values)
print 'After packing  :', binascii.hexlify(b.raw)
print 'Unpacked:', s.unpack_from(b, 0)
print 'array:'

a = array.array('c', '\0' * s.size)
print 'Before packing :', binascii.hexlify(a)

s.pack_into(a, 0, *values)
print 'After packing  :', binascii.hexlify(a)
print 'Unpacked:', s.unpack_from(a, 0)

Performance Testing:

I tested both buffer types by looping through and filling a python array and a ctypes string_buffer using pack_into and unpack_from 250,000 times with an incremental double. It’s a silly example usage, but the python array took a little over 25 seconds to fill, while the ctypes took 22 seconds.

For comparison, I also testing with reallocation of the array on each iteration of the loop. The total time jumped up to a minute and a half.

A more specific example:

Binaries used for decoding on two ends of a messaging interface can convey a lot of information and is quite a bit more efficient than sending structures full of other larger things like more arrays, doubles, etc. The payload of a binary buffer saves a lot in terms of bandwidth when you’re sending messages very frequently.

And besides, you can’t always control the incoming data type when you need to talk to another piece of code through a messaging interface.

Pygtk: Check to see if a button is sensitive

Setting and unsetting flags is a no-brainer, but how do you just check to see if a flag such as ‘gtk.SENSITIVE’ is set for something like a button? Here’s an example of how easy it is.

Use the bitwise and operator, ‘&’ with the flag you’re interested in.

It’s pretty simple but hard to find this answer when you’re looking for a method to extract a bool value. I wish it was as easy as using ‘is_sensitive()’ that returns true if the widget flag ‘gtk.SENSITIVE’ is set. But use the ampersand with the flags and you’re all set.

two power buttons on silver

This is specifically for pygtk and only applicable to buttons, really you should be moving to QT because everything from documentation to implementation of a button is that just much smoother.

Here’s a short example of checking for a save button’s sensitivity before proceeding to quit the main loop. This is something you may use when a quit button is clicked but changes have been made to a document and you want to give the user one last chance to save before exiting your application.

def quit(self, widget):
  if self.button_save.flags() & gtk.SENSITIVE:
    msg = "Save changes before closing?"
    dialog = gtk.MessageDialog(None, gtk.DIALOG_DESTROY_WITH_PARENT,
            gtk.MESSAGE_INFO, gtk.BUTTONS_YES_NO, msg)
    dialog.set_title("There are unsaved Changes")
    dialog.set_position(gtk.WIN_POS_CENTER)
    dialog.set_keep_above(True)

    response = dialog.run()
    dialog.destroy()
                
    if response != gtk.RESPONSE_YES:
      return False

  gtk.main_quit()

Write simple netsnmp apps in Python

Here’s a couple of different ways you can use netsnmp in Python.

I had a hard time finding documentation, and what I did find was old and outdated. I figured most of it out just by playing around with the library.

#!/usr/bin/env python
import netsnmp

string = 'public'
ver = 1
port = 161
host = '192.168.1.1'

# uptime using method 1
bind1 = netsnmp.Varbind('sysUpTime.0')
# 1 minute load using method 2
bind2 = netsnmp.Varbind('.1.3.6.1.4.1.2021.10.1.3.1')

snmpget = netsnmp.snmpget(bind1,
                    Version=ver,
                    RemotePort=port,
                    DestHost=host,
                    Community=string)
uptime_seconds = snmpget[0]
print uptime_seconds

list = ( bind1, bind2 )
x = netsnmp.Session(DestHost=host,
                    Version=ver,
                    RemotePort=port,
                    Timeout=400000,
                    Retries=5,
                    Community=string)
output_list = x.get(list)
if not output_list:
    print "FAILED TO CONNECT!!!"
    sys.exit(1)

if output_list[0]:
    uptime = output_list[0]

if output_list[1]:
    load1 = output_list[1]

python blue and yellow logoNext I wrote a class to wrap it up as an AWN applet. If you’ve never heard of AWN or haven’t tried the avant-window-navigator you should definitely check it out and consider continuing development on it. It was the best app bar available; very pretty. It fit my needs at the time anyway.

I replaced the bottom gnome-panel with it. If you remove everything except the Launcher/Taskmanager applet and add the Show Desktop applet, it directly replaces gnome-panels functionality completely.

File Check Hash Generator – Recursive Tripwire

finger pointing at security textYou can use this to check to see if anyone has modified, updated, upgraded, added, or removed any files on your system. After you’ve configured a system the way you want it, dump hash files for all the important directories, /etc, /bin, /usr/local, etc., or just dump the whole thing. Move the output to another system. Now if you want to check to see if something has changed, you can hash the file(s) in question and grep for the hash.

A directory like /etc has many subdirectories with subdirectories of their own – not a problem. When the script encounters a directory, it recursively calls itself so it will parse all child directories. Skipping special files should avoid the problem of probing char files, proc, and other gotchas. know it could be better. There’s things like pid files that are useless to hash.

This was just a quick stab at it. Feel free to adapt this to your own needs as you see fit.

Bash script:

#!/bin/bash
md5sum=/usr/bin/md5sum # hash algorithm to use
mkdir=/bin/mkdir
indir=${1} # base input directory to start hashing files
outfile=${2} # full path of output file

if [ "${indir}" == "" -o "${outfile}" == "" ]; then
  echo "Usage: $0  "
  echo "  ex: $0 /etc /root/etc.hash"
  exit 1
fi

for x in `ls "${indir}"`; do
  if [ -d ${indir}/$x ]; then # is a dir
    echo "[ Recursively hashing ${indir}/$x ]"
    $0 ${indir}/$x ${outfile} # pass new path in
    if [ $? != 0 ]; then # recursive call failed, die
      echo "Could not hash ${indir}/$x"
      exit 1
    fi
  else # is not a dir
    if [ -f ${indir}/$x ]; then # regular files only
      ${md5sum} "${indir}/$x" >> "${outfile}"
    fi
  fi
done

exit 0