There are a number of small Python functions that I have copied around to several of my projects, both professional and personal. They are unrelated but generally useful and I have put them into a single repository at gitlab.com/johngoetz/pyutil. Here, I’ll talk about the smaller and more useful methods that I use almost daily.
Back-Porting Features from Older Versions of Python
When dealing with many different systems and platforms, such as in automated testing infrastructure for desktop application software, we can’t always keep Python up to date. This shouldn’t stop us from using new features however and there are a few specific methods that I always back-port to earlier versions of Python.
Add follow_symlinks argument to pathlib.Path.stat()
I want to use the pathlib
module everywhere instead of os.path
and to that end, I had to add this to the pathlib.Path
class. It’s pretty straight-forward, just adding a single argument to the stat()
method:
import pathlib
import sys
if sys.version_info < (3, 10):
def pathlib_Path_stat(self, follow_symlinks=True):
return os.stat(self, follow_symlinks=follow_symlinks)
pathlib.Path.stat = pathlib_Path_stat
Example usage:
if pathlib.Path('secret.txt').stat().st_mode & 0o777 != 0o600:
raise PermissionError('file permissions must be user-read-only')
Allow pathlib.Path
objects with shutil.move()
It took until Python 3.9 for the shutil.move()
method to be pathlib
-aware. But we can add this small bit of code to get it for all previous 3.x versions. Notice, all it’s doing is casting the input src and dst arguments to strings:
import shutil
import sys
if sys.version_info < (3, 9):
_shutil_move = shutil.move
def shutil_move(src, dst, *args, **kwargs):
return _shutil_move(str(src), str(dst), *args, **kwargs)
shutil.move = shutil_move
Usage is the same as the old shutil.move()
but with pathlib.Path
instances:
src = pathlib.Path('./source')
dst = pathlib.Path('/path/to/destination')
shutil.move(src, dst)
Backporting arguments to subprocess.run()
In Python 3.7, the subprocess.run()
got the text
and capture_output
arguments. I use these all the time and while I still have systems that only have up to Python 3.6, I include this patch. It’s a bit more complicated than the previous patches since the capture_output
argument has to be dispatched to both the stdout
and stderr
arguments, and text
is actually an alias for universal_newlines
.
import subprocess
import sys
if sys.version_info < (3, 7):
_subprocess_run = subprocess.run
def subprocess_run(*args, **kwargs):
kwargs['universal_newlines'] = kwargs.pop('text', None)
if kwargs.pop('capture_output', False):
kwargs['stdout'] = subprocess.PIPE
kwargs['stderr'] = subprocess.PIPE
return _subprocess_run(*args, **kwargs)
subprocess.run = subprocess_run
And the usage of subprocess.run()
becomes a little easier:
proc = subprocess.run(['ls', '-l'], text=True, capture_output=True)
results = proc.stdout.split('\n')
Adding shlex.join()
and mapping arguments to strings
This one is a back-port and modification to the latest version (as of Python 3.10). It implements shlex.join()
which joins lists of commands to a single string with all the proper quoting needed. The added feature is that it casts all items in the list to strings, which is something I believe the core function should do already.
import shlex
def shlex_join(split_command):
return ' '.join(map(shlex.quote, map(str, split_command)))
shlex.join = shlex_join
I use this mostly for printing out commands that are run in a debugging/logging context. I highly recommend passing commands as lists to subprocess.run()
instead of strings because quoting can be very tricky with multiple platforms and multiple shells (consider trying to use the same code for Bash and Powershell):
topdir = pathlib.Path('.')
cmd = ['ls', '-l', topdir]
log.info(f'running command: {shlex.join(cmd)}')
subprocess.run(cmd)
Environment Variables
It’s common for me to prepend or append paths
to environment variables in a platform-independent way. This means that on Windows, the separator between paths is a semi-colon while on Linux and macOS, it’s a colon and so with clever use of Python’s str.join(list)
method and the os.pathsep
I have these three functions:
import os
def join_paths(*paths):
return os.pathsep.join(s for p in paths for s in p.split(os.pathsep) if s)
def append_paths_to_env(env, key, *paths):
env[key] = join_paths(*([env.get(key, '')] + list(paths)))
def prepend_paths_to_env(env, key, *paths):
env[key] = join_paths(*(list(paths) + [env.get(key, '')]))
The join_paths(*paths)
function is a just a helper for the append and prepend methods that inspect a dict
for a key
and inserts one or more paths to the value. I had originally set the default value for env
to os.environ
but this made harder to understand when used and besides, I rarely want to modify os.environ
directly and will always make a copy first. For example:
>>> env = dict(PATH='/usr/bin:/usr/sbin:/bin:/sbin')
>>> prepend_paths_to_env(env, 'PATH', '/usr/local/bin')
>>> print(env['PATH'])
/usr/local/bin:/usr/bin:/usr/sbin:/bin:/sbin
I also like to allow for environment variables to override defaults in command-line arguments. For example, I might set the default output directory to the current working directory (.
) but have the variable PROG_OUTDIR
override it if set. For this, I have a simple recursive method:
import os
def env(key, *alternate_keys, default=None, environ=os.environ):
if alternate_keys:
return environ.get(key, env(*alternate_keys, default=default, environ=environ))
else:
return environ.get(key, default)
And to get the directory from the environment, falling back to a default is now a single function call:
cwd = env('PROG_OUTDIR', default='.')
File Operations
There are a few filesystem operations that I do all the time. The following functions are my most-used. They range from getting the size of all files under a drectory to classes I use as the type for command line arguments to ensure the argument value is an existing directory a file with the right permissions.
Directory Size
Building on the pathlib.Path.stat()
method patched into older versions of Python above, we can get the total size of all files under a given directory:
def directory_size(topdir):
files = filter(pathlib.Path.is_file, pathlib.Path(topdir).rglob('*'))
file_sizes = map(lambda p: p.stat(follow_symlinks=False).st_size, files)
return sum(file_sizes)
Be careful when running this on large directories and it may take quite some time and is not optimized at all:
total_bytes = directory_size(pathlib.Path('.'))
Creating a Closed Temporary File
Some, rather limited, operating systems prevent the user from accessing files when they are opened by another process or thread. This prevents us from using the tempfile.NamedTemporaryFile
object as-is across all platforms. In order to support all use-cases, I wrote the following ClosedTemporaryFile
which creates a temporary file, closes it and yields the path as a pathlib.Path
object. The file is then deleted according to the delete argument when exiting the context.
import contextlib
import tempfile
import pathlib
@contextlib.contextmanager
def ClosedTemporaryFile(*args, **kwargs):
delete = kwargs.pop('delete', True)
kwargs['delete'] = False
ftmp = tempfile.NamedTemporaryFile(*args, **kwargs)
try:
fpath = pathlib.Path(ftmp.name)
ftmp.close()
yield fpath
finally:
if delete:
fpath.unlink()
Example usage:
with ClosedTemporaryFile() as ftmp:
# do something with ftmp
Ensure input argument is an existing directory
I typically use argparse
to handle command line arguments. When adding options that must be existing directories, this function can be used for the type:
def ExistingDir(p):
p = pathlib.Path(p).expanduser()
if not p.is_dir():
raise FileNotFoundError(f'{p} is not a directory')
return p
For example:
import argparse
parser = argparser.ArgumentParser()
parser.add_argument(
'--topdir', type=ExistingDir, default='.',
help='''Top-level directory (must exist, default: ".").''')
Handle files containing a password or other secret
The last snippet of code I’ll present here is a class I use for files containing private keys or passwords. Similar to ssh keys, I want to force the permissions of the file to be user-only-readable (600
on Posix systems). Note that I have not yet taken the time to learn what the equivalent is on Windows - if it even has an equivalent.
For the function Secret()
below, the input is either a string, in which case it’s merely passed-through, or an existing file. The file’s permissions is then verified and the contents read. The output always strips leading and trailing spaces which may be created by editors and such.
def Secret(s):
p = pathlib.Path(s).expanduser()
if p.is_file():
if platform.system() != 'Windows' and p.stat().st_mode & 0o777 != 0o600:
msg = 'file containing secret must have permissions set to 600'
raise PermissionError(msg)
s = p.read_text()
return s.strip()
I use this often for command line argument handling of passwords:
import argparse
parser = argparser.ArgumentParser()
parser.add_argument(
'--password', type=Secret, default='~/.password',
help='''The password or a file containing the password
which must have permissions 600.
(default: "~/.password").''')
My personal pyutil
Python package
I maintain all the above functions and a whole lot more in a single repository on gitlab and typically reference this directly in many of my python modules that are used in production across many systems. In the pyproject.toml
file, I add the following:
[project]
dependencies = [ 'git+https://gitlab.com/johngoetz/pyutil.git' ]
The package may be installed directly using pip
:
python -mpip install git+https://gitlab.com/johngoetz/pyutil.git