Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude files with matching sigs #6

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

tombburnell
Copy link
Contributor

@tombburnell tombburnell commented Feb 3, 2017

This PR is help cope with major issue where duplicates get processed multiple times. It should ignore any files when there are more than 1 with the same signature

  • I've avoided opening the file twice (one for sig, once for processing)
  • I’ve make it cope with case where more than 2 files with same sig
  • Ive added METRICs for errors, duplicates and truncations so we can monitor in splunk
  • and added a debug flag

sender Outdated
f.close();

finally:
f.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file might be closed twice.. possibly resulting in an exception

sender Outdated
log("Exception: %s" % e)
log("METRIC ns=forwarder.error.preprocess file=\"%s\" exception=\"%s\"" % (fn, str(e).replace("\n", "")))
log("Exception=\"%s\"" % e)
f.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file might not be open at this point. to guarantee it, put the f = open before try:

sender Outdated
try:
if not os.path.isfile(fn):
log("File no longer exists: %s" % fn)
continue
if os.path.getsize(fn) < self.signatureLength and not self.isCompressed(fn):
log("Skipping as sig too short or compressed: %s" % fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the message here 'skipping as file to short and not compressed'?

sender Outdated
@@ -319,6 +331,7 @@ class Tail(object):
if should_stop():
return
for line in f:
debug("line %s" % line.replace("\n",""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calls like this in production will first resolve the string template, only to find out the debug is OFF... so the ideal, for performance in production, is to:
if debug_on: debug(...)

@@ -0,0 +1,217 @@
#!/usr/bin/env awk -f
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file part of this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants