ExecuteSQL Avro schema: all fields are nullable

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ExecuteSQL Avro schema: all fields are nullable

christophe.monnet
Reply | Threaded
Open this post in threaded view
|

Re: ExecuteSQL Avro schema: all fields are nullable

Matt Burgess-2
Christophe,

IIRC the isNullable() method gives inconsistent results across JDBC
drivers (see [1], [2] for examples) and in such cases might cause more
harm than good. Having said that, we could perhaps add a property to
the relevant components such as "Honor Non-Nullable Fields" or
something like that, where the user could choose "true" if they were
confident that their JDBC driver supported isNullable() correctly and
that their SQL queries would return result sets for whose metadata
(such as isNullable) is correct. Specifically I mean the case from [2]
where you do a join from a table with a nullable column with another
having the same column but non-nullable field. For an outer join the
result column should be nullable, but in [2] they point out that
PostgreSQL (at least in 2012 at the time of the post), if you select
the field from the non-nullable table, then the
ResultSetMetaData.isNullable() will return false although it can
clearly be null.

As a workaround, you can use ConvertRecord with an output schema that
removes the nullable union from the fields. There should be no nulls
in the data (as the column was supposedly non-nullable) so there
should be no error there. If for some reason you want to check that
there are no nulls downstream, you can use ValidateRecord with the
schema that has the nullable unions removed, then any records with a
null value in that column will be transferred to invalid, and only the
non-nulls will be transferred to success.

If you'd like to have the configurable property to generate
non-nullable fields in the schema, please feel free to file an
improvement Jira [3] and we can continue the discussion there :)

Regards,
Matt

[1] https://github.com/confluentinc/kafka-connect-jdbc/issues/197
[2] https://www.postgresql.org/message-id/63DBC81F-2AB0-4C02-AC08-2B05C31FFCA6%40gmail.com
[3] https://issues.apache.org/jira/browse/NIFI
On Thu, Sep 13, 2018 at 7:43 AM <[hidden email]> wrote: