module

dframeio.abstract

Abstract interfaces for all storage backends

Classes
class

dframeio.abstract.AbstractDataFrameReader()

Interface for reading dataframes from different storage drivers

Methods
  • read_to_dict(source, columns, row_filter, limit, sample, drop_duplicates) (dict(str: )) Read data into a dict of named columns</>
  • read_to_pandas(source, columns, row_filter, limit, sample, drop_duplicates) (DataFrame) Read data into a pandas.DataFrame</>
abstract method

read_to_pandas(source, columns=None, row_filter=None, limit=-1, sample=-1, drop_duplicates=False)

Read data into a pandas.DataFrame

Parameters
  • source (str) A string specifying the data source (format differs by backend)
  • columns (list of str, optional) List of column names to limit the reading to
  • row_filter (str, optional) Filter expression for selecting rows.
  • limit (int, optional) Maximum number of rows to return (top-n)
  • sample (int, optional) Size of a random sample to return
  • drop_duplicates (bool, optional) Whether to drop duplicate rows from the final selection
Returns (DataFrame)

A pandas DataFrame with the requested data.

The filter and limit arguments are applied in the following order:

  • first the row_filter expression is applied and all matching rows go into the next step,
  • afterwards the limit argument is applied if given,
  • in the next step the sample argument is applied if it is specified,
  • at the very end drop_duplicates takes effect. This means that this flag may reduce the output size further and that fewer rows may be returned as specified with limit or sample if there are duplicates in the data.
abstract method

read_to_dict(source, columns=None, row_filter=None, limit=-1, sample=-1, drop_duplicates=False)

Read data into a dict of named columns

Parameters
  • source (str) A string specifying the data source (format differs by backend)
  • columns (list of str, optional) List of column names to limit the reading to
  • row_filter (str, optional) NOT IMPLEMENTED. Reserved keyword for filtering rows.
  • limit (int, optional) Maximum number of rows to return (top-n)
  • sample (int, optional) Size of a random sample to return
  • drop_duplicates (bool, optional) Whether to drop duplicate rows
Returns (dict(str: ))

A dictionary with column names as key and a list with column values as values

The logic of the filtering arguments is as documented for read_to_pandas().

class

dframeio.abstract.AbstractDataFrameWriter()

Interface for writing dataframes to different storage drivers

Methods
  • write_append(target, dataframe) Write data in append-mode</>
  • write_replace(target, dataframe) Write data with full replacement of an existing dataset</>
abstract method

write_replace(target, dataframe)

Write data with full replacement of an existing dataset

abstract method

write_append(target, dataframe)

Write data in append-mode