module

dframeio.abstract

</>

Abstract interfaces for all storage backends

Classes

AbstractDataFrameReader — Interface for reading dataframes from different storage drivers</>
AbstractDataFrameWriter — Interface for writing dataframes to different storage drivers</>

class

`dframeio.abstract.AbstractDataFrameReader()`

</>

Interface for reading dataframes from different storage drivers

Methods

read_to_dict(source, columns, row_filter, limit, sample, drop_duplicates) (dict(str: )) — Read data into a dict of named columns</>
read_to_pandas(source, columns, row_filter, limit, sample, drop_duplicates) (DataFrame) — Read data into a pandas.DataFrame</>

abstract method

`read_to_pandas(source`, `columns=None`, `row_filter=None`, `limit=-1`, `sample=-1`, `drop_duplicates=False)`

</>

Read data into a pandas.DataFrame

Parameters

source (str) — A string specifying the data source (format differs by backend)
columns (list of str, optional) — List of column names to limit the reading to
row_filter (str, optional) — Filter expression for selecting rows.
limit (int, optional) — Maximum number of rows to return (top-n)
sample (int, optional) — Size of a random sample to return
drop_duplicates (bool, optional) — Whether to drop duplicate rows from the final selection

Returns (DataFrame)

A pandas DataFrame with the requested data.

The filter and limit arguments are applied in the following order:

first the row_filter expression is applied and all matching rows go into the next step,
afterwards the limit argument is applied if given,
in the next step the sample argument is applied if it is specified,
at the very end drop_duplicates takes effect. This means that this flag may reduce the output size further and that fewer rows may be returned as specified with limit or sample if there are duplicates in the data.

abstract method

`read_to_dict(source`, `columns=None`, `row_filter=None`, `limit=-1`, `sample=-1`, `drop_duplicates=False)`

</>

Read data into a dict of named columns

Parameters

source (str) — A string specifying the data source (format differs by backend)
columns (list of str, optional) — List of column names to limit the reading to
row_filter (str, optional) — NOT IMPLEMENTED. Reserved keyword for filtering rows.
limit (int, optional) — Maximum number of rows to return (top-n)
sample (int, optional) — Size of a random sample to return
drop_duplicates (bool, optional) — Whether to drop duplicate rows

Returns (dict(str: ))

A dictionary with column names as key and a list with column values as values

The logic of the filtering arguments is as documented for read_to_pandas().

class

`dframeio.abstract.AbstractDataFrameWriter()`

</>

Interface for writing dataframes to different storage drivers

Methods

write_append(target, dataframe) — Write data in append-mode</>
write_replace(target, dataframe) — Write data with full replacement of an existing dataset</>

abstract method

`write_replace(target`, `dataframe)`

</>

Write data with full replacement of an existing dataset

abstract method

`write_append(target`, `dataframe)`

</>

Write data in append-mode

dframeio.abstract

dframeio.abstract.AbstractDataFrameReader()

read_to_pandas(source, columns=None, row_filter=None, limit=-1, sample=-1, drop_duplicates=False)

read_to_dict(source, columns=None, row_filter=None, limit=-1, sample=-1, drop_duplicates=False)

dframeio.abstract.AbstractDataFrameWriter()

write_replace(target, dataframe)

write_append(target, dataframe)

`dframeio.abstract.AbstractDataFrameReader()`

`read_to_pandas(source`, `columns=None`, `row_filter=None`, `limit=-1`, `sample=-1`, `drop_duplicates=False)`

`read_to_dict(source`, `columns=None`, `row_filter=None`, `limit=-1`, `sample=-1`, `drop_duplicates=False)`

`dframeio.abstract.AbstractDataFrameWriter()`

`write_replace(target`, `dataframe)`

`write_append(target`, `dataframe)`