Some Useful Python Utilities -

There are a number of small Python functions that I have copied around to several of my projects, both professional and personal. They are unrelated but generally useful and I have put them into a single repository at gitlab.com/johngoetz/pyutil. Here, I’ll talk about the smaller and more useful methods that I use almost daily.

Back-Porting Features from Older Versions of Python

When dealing with many different systems and platforms, such as in automated testing infrastructure for desktop application software, we can’t always keep Python up to date. This shouldn’t stop us from using new features however and there are a few specific methods that I always back-port to earlier versions of Python.

Add follow_symlinks argument to `pathlib.Path.stat()`

I want to use the pathlib module everywhere instead of os.path and to that end, I had to add this to the pathlib.Path class. It’s pretty straight-forward, just adding a single argument to the stat() method:

import pathlib
import sys

if sys.version_info < (3, 10):
    def pathlib_Path_stat(self, follow_symlinks=True):
        return os.stat(self, follow_symlinks=follow_symlinks)
    pathlib.Path.stat = pathlib_Path_stat

Example usage:

if pathlib.Path('secret.txt').stat().st_mode & 0o777 != 0o600:
    raise PermissionError('file permissions must be user-read-only')

Allow `pathlib.Path` objects with `shutil.move()`

It took until Python 3.9 for the shutil.move() method to be pathlib-aware. But we can add this small bit of code to get it for all previous 3.x versions. Notice, all it’s doing is casting the input src and dst arguments to strings:

import shutil
import sys

if sys.version_info < (3, 9):
    _shutil_move = shutil.move
    def shutil_move(src, dst, *args, **kwargs):
        return _shutil_move(str(src), str(dst), *args, **kwargs)
    shutil.move = shutil_move

Usage is the same as the old shutil.move() but with pathlib.Path instances:

src = pathlib.Path('./source')
dst = pathlib.Path('/path/to/destination')
shutil.move(src, dst)

Backporting arguments to `subprocess.run()`

In Python 3.7, the subprocess.run() got the text and capture_output arguments. I use these all the time and while I still have systems that only have up to Python 3.6, I include this patch. It’s a bit more complicated than the previous patches since the capture_output argument has to be dispatched to both the stdout and stderr arguments, and text is actually an alias for universal_newlines.

import subprocess
import sys

if sys.version_info < (3, 7):
    _subprocess_run = subprocess.run
    def subprocess_run(*args, **kwargs):
        kwargs['universal_newlines'] = kwargs.pop('text', None)
        if kwargs.pop('capture_output', False):
            kwargs['stdout'] = subprocess.PIPE
            kwargs['stderr'] = subprocess.PIPE
        return _subprocess_run(*args, **kwargs)
    subprocess.run = subprocess_run

And the usage of subprocess.run() becomes a little easier:

proc = subprocess.run(['ls', '-l'], text=True, capture_output=True)
results = proc.stdout.split('\n')

Adding `shlex.join()` and mapping arguments to strings

This one is a back-port and modification to the latest version (as of Python 3.10). It implements shlex.join() which joins lists of commands to a single string with all the proper quoting needed. The added feature is that it casts all items in the list to strings, which is something I believe the core function should do already.

import shlex

def shlex_join(split_command):
    return ' '.join(map(shlex.quote, map(str, split_command)))
shlex.join = shlex_join

I use this mostly for printing out commands that are run in a debugging/logging context. I highly recommend passing commands as lists to subprocess.run() instead of strings because quoting can be very tricky with multiple platforms and multiple shells (consider trying to use the same code for Bash and Powershell):

topdir = pathlib.Path('.')
cmd = ['ls', '-l', topdir]
log.info(f'running command: {shlex.join(cmd)}')
subprocess.run(cmd)

Environment Variables

It’s common for me to prepend or append paths to environment variables in a platform-independent way. This means that on Windows, the separator between paths is a semi-colon while on Linux and macOS, it’s a colon and so with clever use of Python’s str.join(list) method and the os.pathsep I have these three functions:

import os

def join_paths(*paths):
    return os.pathsep.join(s for p in paths for s in p.split(os.pathsep) if s)

def append_paths_to_env(env, key, *paths):
    env[key] = join_paths(*([env.get(key, '')] + list(paths)))

def prepend_paths_to_env(env, key, *paths):
    env[key] = join_paths(*(list(paths) + [env.get(key, '')]))

The join_paths(*paths) function is a just a helper for the append and prepend methods that inspect a dict for a key and inserts one or more paths to the value. I had originally set the default value for env to os.environ but this made harder to understand when used and besides, I rarely want to modify os.environ directly and will always make a copy first. For example:

>>> env = dict(PATH='/usr/bin:/usr/sbin:/bin:/sbin')
>>> prepend_paths_to_env(env, 'PATH', '/usr/local/bin')
>>> print(env['PATH'])
/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin

I also like to allow for environment variables to override defaults in command-line arguments. For example, I might set the default output directory to the current working directory (.) but have the variable PROG_OUTDIR override it if set. For this, I have a simple recursive method:

import os

def env(key, *alternate_keys, default=None, environ=os.environ):
    if alternate_keys:
        return environ.get(key, env(*alternate_keys, default=default, environ=environ))
    else:
        return environ.get(key, default)

And to get the directory from the environment, falling back to a default is now a single function call:

cwd = env('PROG_OUTDIR', default='.')

File Operations

There are a few filesystem operations that I do all the time. The following functions are my most-used. They range from getting the size of all files under a drectory to classes I use as the type for command line arguments to ensure the argument value is an existing directory a file with the right permissions.

Directory Size

Building on the pathlib.Path.stat() method patched into older versions of Python above, we can get the total size of all files under a given directory:

def directory_size(topdir):
    files = filter(pathlib.Path.is_file, pathlib.Path(topdir).rglob('*'))
    file_sizes = map(lambda p: p.stat(follow_symlinks=False).st_size, files)
    return sum(file_sizes)

Be careful when running this on large directories and it may take quite some time and is not optimized at all:

total_bytes = directory_size(pathlib.Path('.'))

Creating a Closed Temporary File

Some, rather limited, operating systems prevent the user from accessing files when they are opened by another process or thread. This prevents us from using the tempfile.NamedTemporaryFile object as-is across all platforms. In order to support all use-cases, I wrote the following ClosedTemporaryFile which creates a temporary file, closes it and yields the path as a pathlib.Path object. The file is then deleted according to the delete argument when exiting the context.

import contextlib
import tempfile
import pathlib

@contextlib.contextmanager
def ClosedTemporaryFile(*args, **kwargs):
    delete = kwargs.pop('delete', True)
    kwargs['delete'] = False
    ftmp = tempfile.NamedTemporaryFile(*args, **kwargs)
    try:
        fpath = pathlib.Path(ftmp.name)
        ftmp.close()
        yield fpath
    finally:
        if delete:
            fpath.unlink()

Example usage:

with ClosedTemporaryFile() as ftmp:
    # do something with ftmp

Ensure input argument is an existing directory

I typically use argparse to handle command line arguments. When adding options that must be existing directories, this function can be used for the type:

def ExistingDir(p):
    p = pathlib.Path(p).expanduser()
    if not p.is_dir():
        raise FileNotFoundError(f'{p} is not a directory')
    return p

For example:

import argparse

parser = argparser.ArgumentParser()
parser.add_argument(
    '--topdir', type=ExistingDir, default='.',
    help='''Top-level directory (must exist, default: ".").''')

Handle files containing a password or other secret

The last snippet of code I’ll present here is a class I use for files containing private keys or passwords. Similar to ssh keys, I want to force the permissions of the file to be user-only-readable (600 on Posix systems). Note that I have not yet taken the time to learn what the equivalent is on Windows - if it even has an equivalent.

For the function Secret() below, the input is either a string, in which case it’s merely passed-through, or an existing file. The file’s permissions is then verified and the contents read. The output always strips leading and trailing spaces which may be created by editors and such.

def Secret(s):
    p = pathlib.Path(s).expanduser()
    if p.is_file():
        if platform.system() != 'Windows' and p.stat().st_mode & 0o777 != 0o600:
            msg = 'file containing secret must have permissions set to 600'
            raise PermissionError(msg)
        s = p.read_text()
    return s.strip()

I use this often for command line argument handling of passwords:

import argparse

parser = argparser.ArgumentParser()
parser.add_argument(
    '--password', type=Secret, default='~/.password',
    help='''The password or a file containing the password
            which must have permissions 600.
            (default: "~/.password").''')

My personal `pyutil` Python package

I maintain all the above functions and a whole lot more in a single repository on gitlab and typically reference this directly in many of my python modules that are used in production across many systems. In the pyproject.toml file, I add the following:

[project]
dependencies = [ 'git+https://gitlab.com/johngoetz/pyutil.git' ]

The package may be installed directly using pip:

python -mpip install git+https://gitlab.com/johngoetz/pyutil.git