[DISCUSS] Expression Language - New Feature (UDF)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Expression Language - New Feature (UDF)

Ed B
Hi Devs,

I've finished (almost) development of a new feature for EL: UDF (User
Defined Functions).
This will add flexibility to EL, reduce complexity of the flows and so on,
and so on...

Example:
${exec('com.example.MyDateUtils', 'minus', ${now()}, '1 year')}
or
${myAttribute:exec('com.example.MyMasker', 'maskCreditCard')}

But while making it more generic and flexible, I've faced some challenges,
which I would like to discuss.

First of all, as per current implementation of EL, any expression should
have output type. Each evaluator needs to specify which type it will return
(STRING, BOOLEAN, WHOLE_NUMBER, DATE, DECIMAL, NUMBER). And, as of now, I
think it is not possible to make it generic without impacting entire EL
framework, which I would like to avoid obviously.
So, first challenge - new EXEC function will always return String type. Or
should I try to introduce new bugs in a core of EL? :)

Second challenge is an interface of this function.
Easiest way is to define new interface
org.apache.nifi.attribute.expression.language.ExecutableUDF with method "
execute" accepting array of Strings. 2 Cons: new class implementation for
each function, and casting from strings into desired data types.

More advanced way: no interface, methods will be looked up by the types
provided in EL itself. This is more opened API, but needs more reflection
and assumptions during error handling. This way is more flexible and
doesn't require implementation of given interface (reduced maven
dependencies, and as a result - less hassle during upgrades, especially if
there are changes in how class loaders are defined, etc). I have it
implemented and it works fine (with some limitations related to mapping of
EL vs Java data types), but not sure if we need to limit to the interface
implementation...

I will appreciate any feedback and comments from community.

Thanks,
Ed.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Expression Language - New Feature (UDF)

Joe Witt
Ed,

This is an interesting path to explore and but historically we've held
off going further because the use cases for it were never really well
articulated/strong relative to the obvious complexities of it.

This is definitely a good thing to discuss in more detail.

I think your note starts to enumerate some good places that will need
attention.  Things that come to top of mind for me:
- Would require changes to nifi-api

- Would probably require alignment with classloader isolation so that
the classloader of the component being used is leveraged when
evaluating the expression language and that the UDFs are found within
the given nar.  This would be quite a big change to EL overall.  Also
there are certainly other approaches i've not considered or talked
about here because they're really complex.

It seems like this would be a pretty major change in extensibility and
I wonder if this would be worth it.  If we're needing to make EL
statements so flexible it might be that attributes are being
over-leveraged.  It would be good to talk about the use cases in more
detail.  The components are the primary point of extension intended
here.  If the EL statements start to include classnames, etc.. then
the complexity of them seems not too different than someone just
writing a custom processor using one of the scripting options.

Certainly not trying to shut this down - just want to have a lot more
discussion/basis for it.

Thanks




On Wed, Jul 25, 2018 at 12:00 AM, Ed B <[hidden email]> wrote:

> Hi Devs,
>
> I've finished (almost) development of a new feature for EL: UDF (User
> Defined Functions).
> This will add flexibility to EL, reduce complexity of the flows and so on,
> and so on...
>
> Example:
> ${exec('com.example.MyDateUtils', 'minus', ${now()}, '1 year')}
> or
> ${myAttribute:exec('com.example.MyMasker', 'maskCreditCard')}
>
> But while making it more generic and flexible, I've faced some challenges,
> which I would like to discuss.
>
> First of all, as per current implementation of EL, any expression should
> have output type. Each evaluator needs to specify which type it will return
> (STRING, BOOLEAN, WHOLE_NUMBER, DATE, DECIMAL, NUMBER). And, as of now, I
> think it is not possible to make it generic without impacting entire EL
> framework, which I would like to avoid obviously.
> So, first challenge - new EXEC function will always return String type. Or
> should I try to introduce new bugs in a core of EL? :)
>
> Second challenge is an interface of this function.
> Easiest way is to define new interface
> org.apache.nifi.attribute.expression.language.ExecutableUDF with method "
> execute" accepting array of Strings. 2 Cons: new class implementation for
> each function, and casting from strings into desired data types.
>
> More advanced way: no interface, methods will be looked up by the types
> provided in EL itself. This is more opened API, but needs more reflection
> and assumptions during error handling. This way is more flexible and
> doesn't require implementation of given interface (reduced maven
> dependencies, and as a result - less hassle during upgrades, especially if
> there are changes in how class loaders are defined, etc). I have it
> implemented and it works fine (with some limitations related to mapping of
> EL vs Java data types), but not sure if we need to limit to the interface
> implementation...
>
> I will appreciate any feedback and comments from community.
>
> Thanks,
> Ed.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Expression Language - New Feature (UDF)

Ed B
Hi Joe,
I initially missed your response and just continued development. Then I
needed to redesign the entire thing to address your concerns, cause them
all made sense.

Now it is ready (NIFI-5492_EXEC Adding UDF to EL
<https://github.com/apache/nifi/pull/3008>) and I'm opened for discussion.

Here are few points from your concerns:
- EL APIs - no change (added new function)
- NIFI APIs - no change
- Full isolation of custom provided JARs with UDFs. They won't need to be
part of any NARs.
- UDF Jars can be anywhere on a file system, and nifi.properties file will
have a property for that location.

I would appreciate if you could take a look in high level at least and
share your feedback with me. :)

Thank you!
Ed.

On Wed, Jul 25, 2018 at 1:56 PM Joe Witt <[hidden email]> wrote:

> Ed,
>
> This is an interesting path to explore and but historically we've held
> off going further because the use cases for it were never really well
> articulated/strong relative to the obvious complexities of it.
>
> This is definitely a good thing to discuss in more detail.
>
> I think your note starts to enumerate some good places that will need
> attention.  Things that come to top of mind for me:
> - Would require changes to nifi-api
>
> - Would probably require alignment with classloader isolation so that
> the classloader of the component being used is leveraged when
> evaluating the expression language and that the UDFs are found within
> the given nar.  This would be quite a big change to EL overall.  Also
> there are certainly other approaches i've not considered or talked
> about here because they're really complex.
>
> It seems like this would be a pretty major change in extensibility and
> I wonder if this would be worth it.  If we're needing to make EL
> statements so flexible it might be that attributes are being
> over-leveraged.  It would be good to talk about the use cases in more
> detail.  The components are the primary point of extension intended
> here.  If the EL statements start to include classnames, etc.. then
> the complexity of them seems not too different than someone just
> writing a custom processor using one of the scripting options.
>
> Certainly not trying to shut this down - just want to have a lot more
> discussion/basis for it.
>
> Thanks
>
>
>
>
> On Wed, Jul 25, 2018 at 12:00 AM, Ed B <[hidden email]> wrote:
> > Hi Devs,
> >
> > I've finished (almost) development of a new feature for EL: UDF (User
> > Defined Functions).
> > This will add flexibility to EL, reduce complexity of the flows and so
> on,
> > and so on...
> >
> > Example:
> > ${exec('com.example.MyDateUtils', 'minus', ${now()}, '1 year')}
> > or
> > ${myAttribute:exec('com.example.MyMasker', 'maskCreditCard')}
> >
> > But while making it more generic and flexible, I've faced some
> challenges,
> > which I would like to discuss.
> >
> > First of all, as per current implementation of EL, any expression should
> > have output type. Each evaluator needs to specify which type it will
> return
> > (STRING, BOOLEAN, WHOLE_NUMBER, DATE, DECIMAL, NUMBER). And, as of now, I
> > think it is not possible to make it generic without impacting entire EL
> > framework, which I would like to avoid obviously.
> > So, first challenge - new EXEC function will always return String type.
> Or
> > should I try to introduce new bugs in a core of EL? :)
> >
> > Second challenge is an interface of this function.
> > Easiest way is to define new interface
> > org.apache.nifi.attribute.expression.language.ExecutableUDF with method "
> > execute" accepting array of Strings. 2 Cons: new class implementation for
> > each function, and casting from strings into desired data types.
> >
> > More advanced way: no interface, methods will be looked up by the types
> > provided in EL itself. This is more opened API, but needs more reflection
> > and assumptions during error handling. This way is more flexible and
> > doesn't require implementation of given interface (reduced maven
> > dependencies, and as a result - less hassle during upgrades, especially
> if
> > there are changes in how class loaders are defined, etc). I have it
> > implemented and it works fine (with some limitations related to mapping
> of
> > EL vs Java data types), but not sure if we need to limit to the interface
> > implementation...
> >
> > I will appreciate any feedback and comments from community.
> >
> > Thanks,
> > Ed.
>