module
dframeio.abstract
Abstract interfaces for all storage backends
Classes
AbstractDataFrameReader
— Interface for reading dataframes from different storage drivers</>AbstractDataFrameWriter
— Interface for writing dataframes to different storage drivers</>
class
dframeio.abstract.
AbstractDataFrameReader
(
)
Interface for reading dataframes from different storage drivers
Methods
read_to_dict
(
source
,columns
,row_filter
,limit
,sample
,drop_duplicates
)
(dict(str: )) — Read data into a dict of named columns</>read_to_pandas
(
source
,columns
,row_filter
,limit
,sample
,drop_duplicates
)
(DataFrame) — Read data into a pandas.DataFrame</>
abstract method
read_to_pandas
(
source
, columns=None
, row_filter=None
, limit=-1
, sample=-1
, drop_duplicates=False
)
Read data into a pandas.DataFrame
Parameters
source
(str) — A string specifying the data source (format differs by backend)columns
(list of str, optional) — List of column names to limit the reading torow_filter
(str, optional) — Filter expression for selecting rows.limit
(int, optional) — Maximum number of rows to return (top-n)sample
(int, optional) — Size of a random sample to returndrop_duplicates
(bool, optional) — Whether to drop duplicate rows from the final selection
Returns (DataFrame)
A pandas DataFrame with the requested data.
The filter and limit arguments are applied in the following order:
- first the
row_filter
expression is applied and all matching rows go into the next step, - afterwards the
limit
argument is applied if given, - in the next step the
sample
argument is applied if it is specified, - at the very end
drop_duplicates
takes effect. This means that this flag may reduce the output size further and that fewer rows may be returned as specified withlimit
orsample
if there are duplicates in the data.
abstract method
read_to_dict
(
source
, columns=None
, row_filter=None
, limit=-1
, sample=-1
, drop_duplicates=False
)
Read data into a dict of named columns
Parameters
source
(str) — A string specifying the data source (format differs by backend)columns
(list of str, optional) — List of column names to limit the reading torow_filter
(str, optional) — NOT IMPLEMENTED. Reserved keyword for filtering rows.limit
(int, optional) — Maximum number of rows to return (top-n)sample
(int, optional) — Size of a random sample to returndrop_duplicates
(bool, optional) — Whether to drop duplicate rows
Returns (dict(str: ))
A dictionary with column names as key and a list with column values as values
The logic of the filtering arguments is as documented for
read_to_pandas()
.
class
dframeio.abstract.
AbstractDataFrameWriter
(
)
Interface for writing dataframes to different storage drivers
Methods
write_append
(
target
,dataframe
)
— Write data in append-mode</>write_replace
(
target
,dataframe
)
— Write data with full replacement of an existing dataset</>