Monday, December 30, 2013

How do you write your own custom SerDe ?

      In most cases, users want to write a Deserializer instead of a SerDe, because users just want to read their own data format instead of writing to it. 
     For example, the RegexDeserializer will deserialize the data using the configuration parameter 'regex', and possibly a list of column names 
      If your SerDe supports DDL (basically, SerDe with parameterized columns and column types), you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe from scratch. The reason is that the framework passes DDL to SerDe through "thrift DDL" format, and it's non-trivial to write a "thrift DDL" parser. 

0 comments:

Post a Comment