narrow_down.scylladb module#

Storage backend based on ScyllaDB.

ScyllaDB is a low-latency distributed key-value store, compatible with the Apache Cassandra protocol. For details see https://www.scylladb.com/.

class narrow_down.scylladb.ScyllaDBStore(cluster_or_session, keyspace, table_prefix=None)[source]#

Bases: StorageBackend

Storage backend for a SimilarityStore using ScyllaDB.

Parameters:
  • cluster_or_session (cassandra.cluster.Cluster | cassandra.cluster.Session) –

  • keyspace (str) –

  • table_prefix (str | None) –

__init__(cluster_or_session, keyspace, table_prefix=None)[source]#

Create a new empty or connect to an existing SQLite database.

Parameters:
  • cluster_or_session (cassandra.cluster.Cluster | cassandra.cluster.Session) – Can be a cassandra cluster or a session object.

  • keyspace (str) – Name of the keyspace to use.

  • table_prefix (str | None) – A prefix to use for all table names in the database.

Raises:

ValueError – When the keyspace name is invalid.

Return type:

None

async initialize()[source]#

Initialize the tables in the SQLite database file.

Returns:

self

Return type:

ScyllaDBStore

async insert_setting(key, value)[source]#

Store a setting as key-value pair.

Parameters:
  • key (str) –

  • value (str) –

async query_setting(key)[source]#

Query a setting with the given key.

Parameters:

key (str) – The identifier of the setting

Returns:

A string with the value. If the key does not exist or the storage is uninitialized None is returned.

Raises:

cassandra.DriverException – In case the database query fails for any reason.

Return type:

str | None

async insert_document(document, document_id=None)[source]#

Add the data of a document to the storage and return its ID.

Parameters:
  • document (bytes) –

  • document_id (int | None) –

Return type:

int

async query_document(document_id)[source]#

Get the data belonging to a document.

Parameters:

document_id (int) – The id of the document. This ID is created and returned by the insert_document method.

Returns:

The document stored under the key document_id as bytes object.

Raises:

KeyError – If the document is not stored.

Return type:

bytes

async query_documents(document_ids)[source]#

Get the data belonging to multiple documents.

Parameters:

document_ids (List[int]) – Key under which the data is stored.

Returns:

The documents stored under the key document_id as bytes object.

Raises:

KeyError – If no document was found for at least one of the ids.

Return type:

List[bytes]

async remove_document(document_id)[source]#

Remove a document given by ID from the list of documents.

Parameters:

document_id (int) –

async add_document_to_bucket(bucket_id, document_hash, document_id)[source]#

Link a document to a bucket.

Parameters:
  • bucket_id (int) –

  • document_hash (int) –

  • document_id (int) –

async query_ids_from_bucket(bucket_id, document_hash)[source]#

Get all document IDs stored in a bucket for a certain hash value.

Parameters:

document_hash (int) –

Return type:

Iterable[int]

async remove_id_from_bucket(bucket_id, document_hash, document_id)[source]#

Remove a document from a bucket.

Parameters:
  • bucket_id (int) –

  • document_hash (int) –

  • document_id (int) –