API Reference

HashFS is a content-addressable file management system. What does that mean? Simply, that HashFS manages a directory where files are saved based on the file’s hash.

Typical use cases for this kind of system are ones where:

  • Files are written once and never change (e.g. image storage).
  • It’s desirable to have no duplicate files (e.g. user uploads).
  • File metadata is stored elsewhere (e.g. in a database).
class hashfs.HashFS(root, depth=4, width=1, algorithm='sha256', fmode=436, dmode=493)[source]

Content addressable file manager.

root

str

Directory path used as root of storage space.

depth

int, optional

Depth of subfolders to create when saving a file.

width

int, optional

Width of each subfolder to create when saving a file.

algorithm

str

Hash algorithm to use when computing file hash. Algorithm should be available in hashlib module. Defaults to 'sha256'.

fmode

int, optional

File mode permission to set when adding files to directory. Defaults to 0o664 which allows owner/group to read/write and everyone else to read.

dmode

int, optional

Directory mode permission to set for subdirectories. Defaults to 0o755 which allows owner/group to read/write and everyone else to read and everyone to execute.

__contains__(file)[source]

Return whether a given file id or path is contained in the root directory.

__iter__()[source]

Iterate over all files in the root directory.

__len__()[source]

Return count of the number of files in the root directory.

computehash(stream)[source]

Compute hash of file using algorithm.

corrupted(extensions=True)[source]

Return generator that yields corrupted files as (path, address) where path is the path of the corrupted file and address is the HashAddress of the expected location.

count()[source]

Return count of the number of files in the root directory.

delete(file)[source]

Delete file using id or path. Remove any empty directories after deleting. No exception is raised if file doesn’t exist.

Parameters:file (str) – Address ID or path of file.
exists(file)[source]

Check whether a given file id or path exists on disk.

files()[source]

Return generator that yields all files in the root directory.

folders()[source]

Return generator that yields all folders in the root directory that contain files.

get(file)[source]

Return HashAdress from given id or path. If file does not refer to a valid file, then None is returned.

Parameters:file (str) – Address ID or path of file.
Returns:File’s hash address.
Return type:HashAddress
haspath(path)[source]

Return whether path is a subdirectory of the root directory.

idpath(id, extension='')[source]

Build the file path for a given hash id. Optionally, append a file extension.

makepath(path)[source]

Physically create the folder path on disk.

open(file, mode='rb')[source]

Return open buffer object from given id or path.

Parameters:
  • file (str) – Address ID or path of file.
  • mode (str, optional) – Mode to open file in. Defaults to 'rb'.
Returns:

An io buffer dependent on the mode.

Return type:

Buffer

Raises:

IOError – If file doesn’t exist.

put(file, extension=None)[source]

Store contents of file on disk using its content hash for the address.

Parameters:
  • file (mixed) – Readable object or path to file.
  • extension (str, optional) – Optional extension to append to file when saving.
Returns:

File’s hash address.

Return type:

HashAddress

realpath(file)[source]

Attempt to determine the real path of a file id or path through successive checking of candidate paths. If the real path is stored with an extension, the path is considered a match if the basename matches the expected file path of the id.

relpath(path)[source]

Return path relative to the root directory.

remove_empty(subpath)[source]

Successively remove all empty folders starting with subpath and proceeding “up” through directory tree until reaching the root folder.

repair(extensions=True)[source]

Repair any file locations whose content address doesn’t match it’s file path.

shard(id)[source]

Shard content ID into subfolders.

size()[source]

Return the total size in bytes of all files in the root directory.

unshard(path)[source]

Unshard path to determine hash value.

class hashfs.HashAddress[source]

File address containing file’s path on disk and it’s content hash ID.

id

str

Hash ID (hexdigest) of file contents.

relpath

str

Relative path location to HashFS.root.

abspath

str

Absoluate path location of file on disk.

is_duplicate

boolean, optional

Whether the hash address created was a duplicate of a previously existing file. Can only be True after a put operation. Defaults to False.