Pathlib - An alternative to the os.path module
March 27, 2019
Introduction
Pathlib is a Python library used for creating and manipulating path objects. Working with paths with os.path
module can be clunky sometimes and pathlib
is something that is going to help you with that allowing you to handle paths in an object-oriented way.
pathlib
module contains classes to handle filesystem paths using either MS Windows or POSIX standard syntax. The classes can be divided in two categories: Pure paths and concrete paths. Pure paths provide path-handling for strings but do not access the actual filesystem. They are a representation of a path without messing with the filesystem. To put it simple, if you need to create a path from a string, but you don’t want to refer to an actual, existing path, you could use PurePath
to create it.
Concrete paths extend the API and they access the filesystem and can be used, for example, to modify file permissions. So they always refer to an actual file or folder that resides in the memory.
Pure Paths
PurePath
is the base class for all classes in the pathlib
module. It is also used as a base for the Path
subclass which is surely the class that is the most useful one for a regular programmer.
Instantiating pathlib.PurePath
returns PurePosixPath
or PureWindowsPath
depening on the operating system on which the code in run. This naturally facilitates writing cross-platform code, because the class will take care of the correct syntax to use.
Pure paths may be useful also if you need to manipulate Unix paths on a Windows system and vice versa. A path can be created from segments of strings or os.PathLike
objects. Let’s take a look at the example below.
from pathlib import PurePath
path = PurePath(first, second, third)
print(repr(path))
The code prints out PureWindowsPath('first/second/third')
. If you run the code on Linux or MacOS, the repr
should be PurePosixPath('first/second/third')
.
The syntax for extending an existing path is not obvious, but once you get used to it, it will prove itself very readable. The division operator /
is used for the extension.
from pathlib import PurePath
path = PurePath(first, second, third)
print(repr(path))
extended_path = path / fourth
print(repr(extended_path))
The code above prints out PureWindowsPath('first/second/third/fourth')
.
You can also use the double-dot operator (ie. refer to parent folder) with it. The single-dot operator, which refers to the current folder, will get automatically trimmed.
from pathlib import PurePath
path = PurePath(first, second, third)
print(repr(path))
extended_path = path / fourth
print(repr(extended_path))
fifth_path = path / .. / fifth_not_fourth
print(repr(fifth_path))
fifth_path = path / .. / fifth_not_fourth / .
print(repr(fifth_path))
Both fifth_path
reprs print out PureWindowsPath('first/second/third/../fifth_not_fourth')
.
Concrete paths
The concrete paths just like the pure paths are cross-platform compatible as they are subclassed form the same base class. Path
is the class that you are most probably going to be using.
The usage of the Path class is pretty straight-forward. Let’s delve into an example right away to make the concept of Path
brighter.
from pathlib import Path
# Current folder
current_path = Path.cwd()
print(repr(current_path))
# Does the path exist?
isPath = current_path.exists()
print(isPath)
# Is the path a directory?
is_directory = current_path.is_dir()
print(is_directory)
# Is the path a file?
is_file = current_path.is_file()
print(is_file)
# Create a dir in the current path
new_dir_path = current_path / new_dir
try:
Path.mkdir(new_dir_path)
except FileExistsError:
print(Path already exists)
# Rename a path
modified_dir_path = current_path / newer_dir
try:
new_dir_path.rename(modified_dir_path)
except FileExistsError:
print(Path already exists)
# Remove a path
Path.rmdir(Path.cwd() / newer_dir)
# Users home folder
home_path = Path.home()
print(repr(home_path))
The methods that you are going to use the most are probably in the example above. They are the basic operations for manipulating paths.
Some more interesting methods are Path.iterdir()
and Path.glob()
.
Path.iterdir()
creates a generator that yields path objects of the directory contents. It doesn’t walk in the directories that it yields, but you can immagine that with iterdir()
and is_dir()
you could quite easily create your own walk function.
Also Path.glob()
create a generator that yield all files that match to your search term. You could use it to find all python files in a path like this:
current_path = Path.cwd()
for py_file in current_path.glob(*.py):
print(py_file)