dframeio.filter
Filter expressions for data reading operations with predicate pushdown.
This module is responsible for translate filter expressions from a simplified SQL syntax into different formats understood by the various backends. This way the same language can be used to implement filtering regardless of the data source.
The grammar of the filter statements is the same as in a WHERE clause in SQL. Supported features:
- Comparing column values to numbers, strings and another column's values using the operators
> < = != >= <=
, e.g.a.column < 5
- Comparison against a set of values with ÌN and
NOT IN
, e.g.a.column IN (1, 2, 3)
- Boolean combination of conditions with
AND
,OR
andǸOT
NULL
comparison as ina IS NULL
orb IS NOT NULL
Strings can be quoted with single-quotes and double-quotes. Column names can but don't have to be quoted with SQL quotes (backticks). E.g.:
`a.column` = "abc" AND b IS NOT NULL OR index < 50
to_prefix_notation
(
statement
)
(str) — Parse a filter statement and return it in prefix notation.</>to_psql
(
statement
)
(str) — Convert a filter statement to Postgres SQL syntax</>to_pyarrow_dnf
(
statement
)
(Union(list of list of (str, str, any), list of (str, str, any), (str, str, any))) — Convert a filter statement to the disjunctive normal form understood by pyarrow</>
dframeio.filter.
to_prefix_notation
(
statement
)
Parse a filter statement and return it in prefix notation.
statement
(str) — A filter predicate as string
The filter statement in prefix notation (polish notation) as string
>>> to_prefix_notation("a.column != 0")
'(!= Column<a.column> 0)'
>>> to_prefix_notation("a > 1 and b <= 3")
'(AND (> Column<a> 1) (<= Column<b> 3))'
dframeio.filter.
to_pyarrow_dnf
(
statement
)
Convert a filter statement to the disjunctive normal form understood by pyarrow
Predicates are expressed in disjunctive normal form (DNF), like [[('x', '=', 0), ...], ...]
.
The outer list is understood as chain of disjunctions ("or"), every inner list as a chain
of conjunctions ("and"). The inner lists contain tuples with a single operation
in infix notation each.
More information about the format and its limitations can be found in the
pyarrow documentation.
statement
(str) — A filter predicate as string
The filter statement converted to a list of lists of tuples.
ValueError
— If the statement cannot be parsed
>>> to_pyarrow_dnf("a.column != 0")
[[('a.column', '!=', 0)]]
>>> to_pyarrow_dnf("a > 1 and b <= 3")
[[('a', '>', 1), ('b', '<=', 3)]]
>>> to_pyarrow_dnf("a > 1 and b <= 3 or c = 'abc'")
[[('a', '>', 1), ('b', '<=', 3)], [('c', '=', 'abc')]]
dframeio.filter.
to_psql
(
statement
)
Convert a filter statement to Postgres SQL syntax
statement
(str) — A filter predicate as string
The filter statement converted to psql.
ValueError
— If the statement cannot be parsed
>>> to_psql("a.column != 0")
'"a.column" <> 0'
>>> to_psql("a > 1 and b <= 3")
'"a" > 1 AND "b" <= 3'
>>> to_psql("a > 1 and b <= 3 or c = 'abc'")
'"a" > 1 AND "b" <= 3 OR "c" = \'abc\''