https://platformengineering.org logo
#general
Title
# general
s

Swarna Mani

01/27/2023, 12:05 AM
Log aggregation and analysis We are looking for a cost-effective, scalable, and reliable log analysis solution. We currently use a mix of cloud watch, data dog, and sumo. Is there anything you can share about how you resolved a similar issue? What was your architecture? Any open Source tooling? What paid tools did you use? Thanks
n

Nathan Hruby

01/27/2023, 1:06 AM
This is guided by: • log volume and rate • use cases for queries • end users (security devs, ops, etc..) • structure of events • retention time
probably you have some mix of folks needing fast reliable eventing that create metrics and metadata on the stream volume, highly detailed and relatable events (liek traces), and bulk queries to answer security and product type questions, which probably means you may need multiple tools/datastores
s

Swarna Mani

01/27/2023, 4:26 AM
What tools would you recommend?
Thanks Nathan
e

Endre Karlson

01/27/2023, 7:03 AM
if you need something that seems promising in as a newer alternative to opensearch checkout quickwit
n

Nathan Hruby

01/27/2023, 3:11 PM
Grafana Loki and Graylog are two tools not in your list that I've used before and work pretty well
oh, hahah, and logz.io and elastic.co of course 🙂
and if you wanted to "jsut do it yourself in aws" this is a good high level overview
s

Swarna Mani

01/27/2023, 5:00 PM
Thanks @Nathan Hruby
r

Ric McLaughlin

01/27/2023, 6:57 PM
Seen a growing use of Athena (as described in that Aha! post) for the log analysis use case at a cost way, way lower than CW Logs.
For some element of real time analysis, I’d be doing something like: kinesis -> stream filters (to trigger a lambda) -> firehose -> OpenSearch collection and then OpenSearch serverless for analysis.
n

Nathan Hruby

01/27/2023, 7:34 PM
opensearch just gets expensive at massive data quantity, this is where a firehose that splits the data into s3 for longer duration (months, years, etc..) and opensearch for short term (days, weeks) gets very cool
d

Daniel Serodio

02/10/2023, 7:33 PM
Coming late to this thread since I just found it via the newsletter, but ChaosSearch (SaaS) seems really interesting for this. I've watched a demo but haven't implemented it yet.
e

Endre Karlson

02/12/2023, 4:09 PM
@Daniel Serodio Isnt that like Quickwit does then ? they to do scale out on s3 elastic ish
d

Daniel Serodio

02/23/2023, 4:36 PM
Never heard of Quickwit before, but looking at its homepage, is looks like ChaosSearch is 100% SaaS/"Serverless" (no need to worry about servers) while Quickwit is self-hosted/self-managed
e

Endre Karlson

02/23/2023, 4:45 PM
depends on qhat you need / want :)
a

Ashish Hanwadikar

02/27/2023, 12:00 AM
We, kloudfuse.com, have a unified observability platform that addresses logging, metrics, traces, kubernetes events, etc. all in a single database that is custom designed from grounds-up for storing timeseries and event records. We have a free to download version that you can use forever. I will be happy to demo the product (you can also a live demo from our website). Because of the storage efficiency and ability to run in your VPC, the solution is extremely cost effective - providing 70-90% cost savings compared to SaaS vendors. Hope you will try it out.
j

john hayes

06/09/2023, 9:28 AM
Really interesting thread! We were running a self-hosted instance of Graylog running in K8S. It seems pretty solid and having the self-hosted option makes cost management easier. We ended up migrating away from it. Personally I found the querying language to be rather unintuitive and it came with quite a maintenance overhead.
90 Views