narrow_down.storage module#
Base classes and interfaces for storage.
- exception narrow_down.storage.TooLowStorageLevel[source]#
Bases:
Exception
Raised if a feature is used for which a higher storage level is needed.
- class narrow_down.storage.StorageLevel(value)[source]#
Bases:
Flag
Detail level of document persistence.
- Minimal = 1#
Minimal storage level. Only store the necessary data to perform the search.
- Fingerprint = 2#
In addition to Minimal, also store the fingerprint, e.g. the Minhashes
- Document = 4#
Store the whole inserted document internally.
- Full = 7#
Store everything.
- class narrow_down.storage.Fingerprint#
Type representing the result of a minhashing operation
alias of
ndarray
[Any
,dtype
[uint32
]]
- class narrow_down.storage.StoredDocument(id_=None, document=None, exact_part=None, fingerprint=None, data=None)[source]#
Bases:
object
Data object combining all possible fields of a document stored.
- Parameters:
id_ (int | None) –
document (str | None) –
exact_part (str | None) –
fingerprint (Fingerprint | None) –
data (str | None) –
- document: str | None = None#
The actual content to use for fuzzy matching, e.g. a full unprocessed sentence.
- fingerprint: Fingerprint | None = None#
A fuzzy fingerprint of the document, e.g. a Minhash.
- data: str | None = None#
Payload to persist together with the document in the internal data structures.
- serialize(storage_level)[source]#
Serialize a document to bytes.
- Parameters:
storage_level (StorageLevel) –
- Return type:
- without(*attributes)[source]#
Create a copy with the specified attributes left out.
- Parameters:
attributes (str) – The names of the attributes to leave empty
- Returns:
A copy of the StoredDocument with all the attributes specified in attributes left out. So they will have their default value (None).
- Return type:
- class narrow_down.storage.StorageBackend[source]#
Bases:
ABC
Storage backend for a SimilarityStore.
- abstract async insert_document(document, document_id=None)[source]#
Add the data of a document to the storage and return its ID.
- abstract async remove_document(document_id)[source]#
Remove a document given by ID from the list of documents.
- Parameters:
document_id (int) –
- abstract async add_document_to_bucket(bucket_id, document_hash, document_id)[source]#
Link a document to a bucket.
- class narrow_down.storage.InMemoryStore[source]#
Bases:
StorageBackend
Rust implementation of InMemoryStore.
- serialize()[source]#
Serialize the data into a messagepack so that it can be persisted somewhere.
- Return type:
- to_file(file_path)[source]#
Serialize the data into a messagepack file with the given path.
- Parameters:
file_path (str) –
- classmethod deserialize(msgpack)[source]#
Deserialize an InMemoryStore object from messagepack.
- Parameters:
msgpack (bytes) –
- Return type:
- classmethod from_file(file_path)[source]#
Deserialize an InMemoryStore object the given messagepack file.
- Parameters:
file_path (str) –
- Return type:
- async insert_document(document, document_id=None)[source]#
Add the data of a document to the storage and return its ID.
- async remove_document(document_id)[source]#
Remove a document given by ID from the list of documents.
- Parameters:
document_id (int) –
- async add_document_to_bucket(bucket_id, document_hash, document_id)[source]#
Link a document to a bucket.