module

dframeio.filter

Filter expressions for data reading operations with predicate pushdown.

This module is responsible for translate filter expressions from a simplified SQL syntax into different formats understood by the various backends. This way the same language can be used to implement filtering regardless of the data source.

The grammar of the filter statements is the same as in a WHERE clause in SQL. Supported features:

  • Comparing column values to numbers, strings and another column's values using the operators > < = != >= <=, e.g. a.column < 5
  • Comparison against a set of values with ÌN and NOT IN, e.g. a.column IN (1, 2, 3)
  • Boolean combination of conditions with AND, OR and ǸOT
  • NULL comparison as in a IS NULL or b IS NOT NULL

Strings can be quoted with single-quotes and double-quotes. Column names can but don't have to be quoted with SQL quotes (backticks). E.g.:

`a.column` = "abc" AND b IS NOT NULL OR index < 50
Functions
  • to_prefix_notation(statement) (str) Parse a filter statement and return it in prefix notation.</>
  • to_psql(statement) (str) Convert a filter statement to Postgres SQL syntax</>
  • to_pyarrow_dnf(statement) (Union(list of list of (str, str, any), list of (str, str, any), (str, str, any))) Convert a filter statement to the disjunctive normal form understood by pyarrow</>
function

dframeio.filter.to_prefix_notation(statement)

Parse a filter statement and return it in prefix notation.

Parameters
  • statement (str) A filter predicate as string
Returns (str)

The filter statement in prefix notation (polish notation) as string

Examples
>>> to_prefix_notation("a.column != 0")
'(!= Column<a.column> 0)'
>>> to_prefix_notation("a > 1 and b <= 3")
'(AND (> Column<a> 1) (<= Column<b> 3))'
function

dframeio.filter.to_pyarrow_dnf(statement)

Convert a filter statement to the disjunctive normal form understood by pyarrow

Predicates are expressed in disjunctive normal form (DNF), like [[('x', '=', 0), ...], ...]. The outer list is understood as chain of disjunctions ("or"), every inner list as a chain of conjunctions ("and"). The inner lists contain tuples with a single operation in infix notation each. More information about the format and its limitations can be found in the pyarrow documentation.

Parameters
  • statement (str) A filter predicate as string
Returns (Union(list of list of (str, str, any), list of (str, str, any), (str, str, any)))

The filter statement converted to a list of lists of tuples.

Raises
  • ValueError If the statement cannot be parsed
Examples
>>> to_pyarrow_dnf("a.column != 0")
[[('a.column', '!=', 0)]]
>>> to_pyarrow_dnf("a > 1 and b <= 3")
[[('a', '>', 1), ('b', '<=', 3)]]
>>> to_pyarrow_dnf("a > 1 and b <= 3 or c = 'abc'")
[[('a', '>', 1), ('b', '<=', 3)], [('c', '=', 'abc')]]
function

dframeio.filter.to_psql(statement)

Convert a filter statement to Postgres SQL syntax

Parameters
  • statement (str) A filter predicate as string
Returns (str)

The filter statement converted to psql.

Raises
  • ValueError If the statement cannot be parsed
Examples
>>> to_psql("a.column != 0")
'"a.column" <> 0'
>>> to_psql("a > 1 and b <= 3")
'"a" > 1 AND "b" <= 3'
>>> to_psql("a > 1 and b <= 3 or c = 'abc'")
'"a" > 1 AND "b" <= 3 OR "c" = \'abc\''