Skip to content

dlq-logging: reintroduce structured payload where possible #1105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

yaauie
Copy link
Contributor

@yaauie yaauie commented Dec 21, 2022

Alternative to #1104.

I authored this before getting derailed by a more urgent task, and just saw that @andsel had replicated my work in my off-hours.

In this variant, because the Elasticsearch#handle_event_mapping_errors already was making decisions based on the value of @dlq_writer, I didn't see the need to constrain the already-leaked DLQ abstraction to Elasticsearch::Common#handle_dlq_status and felt that working with @dlq_writer directly was more readable and easier to make sense of.


Fixes: #1103

@yaauie yaauie force-pushed the reintroduce-structured-dlq-logs branch from 718cd0f to a0be29f Compare December 22, 2022 00:46
Copy link
Contributor

@andsel andsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Left a couple of doubts, but I like the idea to keep it simple (this is more direct than #1104).
Nice to have the tests that verifies the structure to avoid future regressions like this.

@@ -387,7 +387,7 @@ def handle_event_mapping_errors(event_mapping_errors)

event_mapping_errors.each do |event_mapping_error|
detailed_message = "#{event_mapping_error.message}; event: `#{event_mapping_error.event.to_hash_with_metadata}`"
handle_dlq_status(event_mapping_error.event, :warn, detailed_message)
@dlq_writer ? @dlq_writer.write(event_mapping_error.event, detailed_message) : @logger.warn(detailed_message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of json log format, it logs a json document with message containing the event map converted to a string.
Is that desired or do you think it should keep document structure, nested in a event field, like you suggested in comment?

I like to keep things simple, without many moving parts, but this reintroduces a "code duplication"

if @dlq_writer
  @dlq_writer.write
else 
    @logger.log
end    

that the original #1084 tried to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original logging for dropped-events-when-DLQ-was-unavailable included the action tuple that was the cause of a rejection from Elasticsearch, but these events are being routed to DLQ because they cannot generate a valid action tuple -- we only have our own pre-request rejection reason and the event itself.

While #1084 tried to avoid this particular DLQ-or-LOG duplication, the related #1080 also introduced paths in this method that differ based on the availability of DLQ -- the abstraction has leaked already.

Looking at this again, it also doesn't make much sense for our detailed_message that is being added to the DLQ as metadata-about-why-the-event-was-invalid to contain a stringified version of the entire event, since the entry itself will contain the entire event.

What about this (pseudo-code):

error_message, event = event_mapping_error.message, event_mapping_error.event
@dlq_writer ? @dlq_writer.write(event, error_message)
            : @logger.warn("Could not index event to Elasticsearch: #{error_message}", event: event.to_hash_with_metadata)

In this way, the DLQ entry would have a message like:

"Badly formatted index, after interpolation still contains placeholder: [%{indexprefix}-foo]"
"Elasticsearch doesn't support [DANCE] action"

And when the DLQ is disabled the logger would have entries like:

{"message":"Could not index event to Elasticsearch: Badly formatted index, after interpolation still contains placeholder: [%{indexprefix}-foo]", "event": { ... } }
{"message":"Could not index event to Elasticsearch: Elasticsearch doesn't support [DANCE] action", "event": {...} }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, go for it 👍

logger = double("logger").as_null_object
subject.instance_variable_set(:@logger, logger)
expect(logger).to receive(:warn).with(a_string_including "Could not index event to Elasticsearch. status: some_status, action: [:action, :params, {")
# logger = double("logger").as_null_object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

logger = double("logger").as_null_object
subject.instance_variable_set(:@logger, logger)
expect(logger).to receive(:warn).with(a_string_including "Could not index event to Elasticsearch. status: some_status, action: [:action, :params, {")
# logger = double("logger").as_null_object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

@yaauie yaauie force-pushed the reintroduce-structured-dlq-logs branch from a0be29f to f69969c Compare April 27, 2023 20:34
Copy link
Contributor Author

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the effective code changes were previously approved and specs are still green, I have updated the changelog and will merge.

@actual_hits_count = actual&.dig(*hits_count_path)
values_match? expected, @actual_hits_count
end
failure_message do |actual|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review note: the changes to the es_spec_helper make previously-flaky specs have more understandable causes, but those specs are no longer flaky :)

elasticsearch_client.indices.delete(:index => '*') rescue nil
elasticsearch_client.indices.delete(:index => '*') rescue puts("DELETE INDICES ERROR: #{$!}")
# Since we are pinned to ES client 7.x, we need to delete data streams the hard way...
elasticsearch_client.perform_request("DELETE", "/_data_stream/*") rescue puts("DELETE DATA STREAMS ERROR: #{$!}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ES 8.x, deleting indices doesn't delete data streams, which was causing some of the specs here to pull in more results than they were intending if-and-only-if they were executed after specs that had created data streams without cleaning up after themselves.

@yaauie yaauie merged commit f4e1820 into logstash-plugins:main Apr 27, 2023
@yaauie yaauie deleted the reintroduce-structured-dlq-logs branch April 27, 2023 22:25
@yaauie yaauie restored the reintroduce-structured-dlq-logs branch April 27, 2023 22:27
@yaauie yaauie deleted the reintroduce-structured-dlq-logs branch April 27, 2023 22:27
@yaauie yaauie mentioned this pull request Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DLQ error messages squashed in string instead of keeping structure when logging in json
3 participants