-
Notifications
You must be signed in to change notification settings - Fork 307
dlq-logging: reintroduce structured payload where possible #1105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dlq-logging: reintroduce structured payload where possible #1105
Conversation
718cd0f
to
a0be29f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Left a couple of doubts, but I like the idea to keep it simple (this is more direct than #1104).
Nice to have the tests that verifies the structure to avoid future regressions like this.
@@ -387,7 +387,7 @@ def handle_event_mapping_errors(event_mapping_errors) | |||
|
|||
event_mapping_errors.each do |event_mapping_error| | |||
detailed_message = "#{event_mapping_error.message}; event: `#{event_mapping_error.event.to_hash_with_metadata}`" | |||
handle_dlq_status(event_mapping_error.event, :warn, detailed_message) | |||
@dlq_writer ? @dlq_writer.write(event_mapping_error.event, detailed_message) : @logger.warn(detailed_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of json log format, it logs a json document with message
containing the event map converted to a string.
Is that desired or do you think it should keep document structure, nested in a event
field, like you suggested in comment?
I like to keep things simple, without many moving parts, but this reintroduces a "code duplication"
if @dlq_writer
@dlq_writer.write
else
@logger.log
end
that the original #1084 tried to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original logging for dropped-events-when-DLQ-was-unavailable included the action tuple that was the cause of a rejection from Elasticsearch, but these events are being routed to DLQ because they cannot generate a valid action tuple -- we only have our own pre-request rejection reason and the event itself.
While #1084 tried to avoid this particular DLQ-or-LOG duplication, the related #1080 also introduced paths in this method that differ based on the availability of DLQ -- the abstraction has leaked already.
Looking at this again, it also doesn't make much sense for our detailed_message
that is being added to the DLQ as metadata-about-why-the-event-was-invalid to contain a stringified version of the entire event, since the entry itself will contain the entire event.
What about this (pseudo-code):
error_message, event = event_mapping_error.message, event_mapping_error.event
@dlq_writer ? @dlq_writer.write(event, error_message)
: @logger.warn("Could not index event to Elasticsearch: #{error_message}", event: event.to_hash_with_metadata)
In this way, the DLQ entry would have a message like:
"Badly formatted index, after interpolation still contains placeholder: [%{indexprefix}-foo]"
"Elasticsearch doesn't support [DANCE] action"
And when the DLQ is disabled the logger would have entries like:
{"message":"Could not index event to Elasticsearch: Badly formatted index, after interpolation still contains placeholder: [%{indexprefix}-foo]", "event": { ... } }
{"message":"Could not index event to Elasticsearch: Elasticsearch doesn't support [DANCE] action", "event": {...} }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, go for it 👍
logger = double("logger").as_null_object | ||
subject.instance_variable_set(:@logger, logger) | ||
expect(logger).to receive(:warn).with(a_string_including "Could not index event to Elasticsearch. status: some_status, action: [:action, :params, {") | ||
# logger = double("logger").as_null_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
logger = double("logger").as_null_object | ||
subject.instance_variable_set(:@logger, logger) | ||
expect(logger).to receive(:warn).with(a_string_including "Could not index event to Elasticsearch. status: some_status, action: [:action, :params, {") | ||
# logger = double("logger").as_null_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
a0be29f
to
f69969c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the effective code changes were previously approved and specs are still green, I have updated the changelog and will merge.
@actual_hits_count = actual&.dig(*hits_count_path) | ||
values_match? expected, @actual_hits_count | ||
end | ||
failure_message do |actual| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review note: the changes to the es_spec_helper make previously-flaky specs have more understandable causes, but those specs are no longer flaky :)
elasticsearch_client.indices.delete(:index => '*') rescue nil | ||
elasticsearch_client.indices.delete(:index => '*') rescue puts("DELETE INDICES ERROR: #{$!}") | ||
# Since we are pinned to ES client 7.x, we need to delete data streams the hard way... | ||
elasticsearch_client.perform_request("DELETE", "/_data_stream/*") rescue puts("DELETE DATA STREAMS ERROR: #{$!}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ES 8.x, deleting indices doesn't delete data streams, which was causing some of the specs here to pull in more results than they were intending if-and-only-if they were executed after specs that had created data streams without cleaning up after themselves.
Alternative to #1104.
I authored this before getting derailed by a more urgent task, and just saw that @andsel had replicated my work in my off-hours.
In this variant, because the
Elasticsearch#handle_event_mapping_errors
already was making decisions based on the value of@dlq_writer
, I didn't see the need to constrain the already-leaked DLQ abstraction toElasticsearch::Common#handle_dlq_status
and felt that working with@dlq_writer
directly was more readable and easier to make sense of.Fixes: #1103