-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NPE when iterating over an input split in CompositeRecordReader.java #436
base: master
Are you sure you want to change the base?
Conversation
this.value = value; | ||
if (currentRecordReader == null) { | ||
try { | ||
if (!nextKeyValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we safely advance here? The interface suggests that setKeyValue will always be called before nextKeyValue, so calling nextKeyValue here could cause us to skip a record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the interface offers a possibility of skipping over a record. Re-reading the code then, it might work to have setKeyValue simply set this.key and this.value when currentRecordReader is null; the next call to nextKeyValue will then invoke re-initialize currentRecordReader and invoke currentRecordReader.setKeyValue on the "cached" key/value.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think from my reading that should work, this stuff is unfortunately hard to parse :/
Ping on this, got time to update as we discussed? |
Hi, sorry for not getting back to you earlier. I tried out the update, and that fixes the issue through one set of splits. However, there needs to be a good way to persist the last set key/value pairs between instances of CompositeRecordReader, which I have not revisited. Here is the stack trace seen where the solution I outlined is tried: Caused by: java.io.IOException: The RecordReader returned a key and value that do not match the key and value sent to it. This means the RecordReader did not properly implement com.twitter.elephantbird.mapred.input.MapredInputFormatCompatible. Current reader class : class com.twitter.elephantbird.mapreduce.input.combine.CompositeRecordReader I'm going to spend some time this week on this. |
Venugopal Gummuluru seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
When iterating over input splits via DeprecatedInputFormatWrapper, DeprecatedInputFormatWrapper.java always calls mifcReader.setKeyValue(key, value) before nextValue is invoked which can call through to setKeyValue in CompositeRecordReader.java. setKeyValue requires that the currentRecordReader instance be non-null; however currentRecordReader is set to null in line 113 at the end of every input split, leading to an NPE with the next call to setKeyValue after the end of an input split.
This patch address the situation by having the setKeyValue method doing a null check for currentRecordReader and in the case it is null, invoking nextKeyValue to see if there are any more elements to be found