In most cases, users want to write a Deserializer instead of a SerDe, because users just want to read their own data format instead of writing to it.
For example, the RegexDeserializer will deserialize the data using the configuration parameter 'regex', and possibly a list of column names
If your SerDe supports DDL (basically, SerDe with parameterized columns and column types), you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe from scratch. The reason is that the framework passes DDL to SerDe through "thrift DDL" format, and it's non-trivial to write a "thrift DDL" parser.
0 comments:
Post a Comment