Releases: twitter/elephant-bird
Added CombinedWritableSequenceFile
Adds cascading support for combining input splits for sequence files
Follow-up fixes for cascading3
Follow-up fixes for cascading3 #465
Cascading 3 support
- Adds Cascading 3 support #463
- Restores API compatibility for ThriftBinaryProtocol/ThriftBinaryDeserializer #461
Upgrade notes:
- The following have been moved from elephant-bird-cascading2 to elephant-bird-cascading-protobuf module:
ProtobufComparator, ProtobufDeserializer, ProtobufReflectionUtil, ProtobufSerialization, ProtobufSerializer
Namespace change:
com.twitter.elephantbird.cascading2.io.protobuf => com.twitter.elephantbird.cascading.protobuf - cascading-hadoop is now marked as provided, so if you depend on elephant-bird-cascading2, you should explicitly add it to your build deps.
Supporting Thrift 9
Thrift 9 support via a classifier: #455
Bugfix for invalid container sizes in ThriftBinaryProtocol
This release contains a single bugfix:
- Add container size check in ThriftBinaryProtocol #448
More compression options and Base64Codec fix
Change log:
Issue 444. Throw DecodeException in Base64Codec (Ruban Monu)
Issue 442. Add some options so we can influence the compression options of intermediate data written by cascading (Ian O'Connell)
Bugfix for max size limit check in SerializedBlock
This release contains an important bug fix in the SerializedBlock and BinaryRecordReader.
Change log:
Issue 441. Use CodedInputStream in SerializedBlock to fix the max size limit check (Ruban Monu)
Generic block record readers and performance improvements
This release includes new Generic block record readers for Lzo compressed protobuf data. It also contains a change to make minimum indexable file size configurable for Lzo output and performance improvements for reading Lzo indexes and splits.
Note: BinaryConverter
now throws DecodeException
if deserializing a record fails, instead of returning null
.
Change log:
Issue 440. LzoGenericBlockOutputFormat (Ruban Monu)
Issue 439. Adds generic block record readers (Ruban Monu)
Issue 435. Faster working with LzoBinary data (Ian O'Connell)
Issue 434. Speed up getSplits by reusing FileStatus'es from the very first listStatus (Gera Shegalov)
Issue 430. Configurable minimum indexable file size (Gera Shegalov)
Critical bug fix, performance improvements, less code-gen, and more!
This release includes a critical fix to avoid double reading first block in split in block-format.
It also uses dynamic protobuf instead of code-generating a protobuf for the block-format, has some performance improvements in base64 codepaths, and more!
Here's the full change log:
Issue 429 Avoid double reading first block in split in block-format (Raghu Angadi)
Issue 421. Expose FileDescriptor and FieldDescriptor (Brian Ramos)
Issue 423. Don't copy the array before handing it for base64 decode (Ian O'Connell)
Issue 422. Pulls in the source of a BSD licenced base64 implementation that is 5x faster than the Apache one for our usage (Ian O'Connell)
Issue 418. Use dynamic protobufs (Remove protobufs) (Raghu Angadi)
Issue 417. A Cascading scheme for combining intermediate sequence files (Akihiro Matsukawa)
Issue 414. Fix typo in docs (thrift, not thrist) (gstaubli)
Issue 413. Trivial Javadocs for LuceneIndexInputFormat (Lewis John McGibbney)
Issue 412. Adding support for Map, Sets and Lists to ThriftToDynamicProto (Brian Ramos)
Issue 411. Fix NPE in CompositeRecordReader due to improper delegate initialization (Jonathan Coveney)
Issue 409. Gzip objects before storing them in the job conf (Alex Levenson)
Issue 407. Make dependencies expicit in Readme quickstart (fixes #406) (Lewis John McGibbney)
Issue 405. Fix bug in CompositeRecordReader (Jonathan Coveney)
Issue 403. Refactor CompositeRecordReader to only make a record reader when necessary (Jonathan Coveney)
Issue 398. Add CombineFileInputFormat support (esp. for lzo) (Jonathan Coveney)