API Reference

HashFS is a content-addressable file management system. What does that mean? Simply, that HashFS manages a directory where files are saved based on the file’s hash.

Typical use cases for this kind of system are ones where:

  • Files are written once and never change (e.g. image storage).
  • It’s desirable to have no duplicate files (e.g. user uploads).
  • File metadata is stored elsewhere (e.g. in a database).
class hashfs.HashFS(root, depth=4, width=1, algorithm='sha256', fmode=436, dmode=493)[source]

Content addressable file manager.

root

str – Directory path used as root of storage space.

depth

int, optional – Depth of subfolders to create when saving a file.

width

int, optional – Width of each subfolder to create when saving a file.

algorithm

str – Hash algorithm to use when computing file hash. Algorithm should be available in hashlib module. Defaults to 'sha256'.

fmode

int, optional – File mode permission to set when adding files to directory. Defaults to 0o664 which allows owner/group to read/write and everyone else to read.

dmode

int, optional – Directory mode permission to set for subdirectories. Defaults to 0o755 which allows owner/group to read/write and everyone else to read and everyone to execute.

computehash(stream)[source]

Compute hash of file using algorithm.

corrupted(extensions=True)[source]

Return generator that yields corrupted files as (path, address) where path is the path of the corrupted file and address is the HashAddress of the expected location.

count()[source]

Return count of the number of files in the root directory.

delete(file)[source]

Delete file using id or path. Remove any empty directories after deleting. No exception is raised if file doesn’t exist.

Parameters:file (str) – Address ID or path of file.
exists(file)[source]

Check whether a given file id or path exists on disk.

files()[source]

Return generator that yields all files in the root directory.

folders()[source]

Return generator that yields all folders in the root directory that contain files.

get(file)[source]

Return HashAdress from given id or path. If file does not refer to a valid file, then None is returned.

Parameters:file (str) – Address ID or path of file.
Returns:File’s hash address.
Return type:HashAddress
haspath(path)[source]

Return whether path is a subdirectory of the root directory.

idpath(id, extension='')[source]

Build the file path for a given hash id. Optionally, append a file extension.

makepath(path)[source]

Physically create the folder path on disk.

open(file, mode='rb')[source]

Return open buffer object from given id or path.

Parameters:
  • file (str) – Address ID or path of file.
  • mode (str, optional) – Mode to open file in. Defaults to 'rb'.
Returns:

An io buffer dependent on the mode.

Return type:

Buffer

Raises:

IOError – If file doesn’t exist.

put(file, extension=None)[source]

Store contents of file on disk using its content hash for the address.

Parameters:
  • file (mixed) – Readable object or path to file.
  • extension (str, optional) – Optional extension to append to file when saving.
Returns:

File’s hash address.

Return type:

HashAddress

realpath(file)[source]

Attempt to determine the real path of a file id or path through successive checking of candidate paths. If the real path is stored with an extension, the path is considered a match if the basename matches the expected file path of the id.

relpath(path)[source]

Return path relative to the root directory.

remove_empty(subpath)[source]

Successively remove all empty folders starting with subpath and proceeding “up” through directory tree until reaching the root folder.

repair(extensions=True)[source]

Repair any file locations whose content address doesn’t match it’s file path.

shard(id)[source]

Shard content ID into subfolders.

size()[source]

Return the total size in bytes of all files in the root directory.

unshard(path)[source]

Unshard path to determine hash value.

class hashfs.HashAddress[source]

File address containing file’s path on disk and it’s content hash ID.

id

str – Hash ID (hexdigest) of file contents.

relpath

str – Relative path location to HashFS.root.

abspath

str – Absoluate path location of file on disk.

is_duplicate

boolean, optional – Whether the hash address created was a duplicate of a previously existing file. Can only be True after a put operation. Defaults to False.