hey guy, after ~4 years in python backend developm...
# general
k
hey guy, after ~4 years in python backend development and another ~6 years in cloud/k8s engineering I picked up a technical support engineer job for a VPN product. I am receiving a lot of client debug bundles (application logs, separate files for OS status and general app status) from our customers that I can barely keep up with reading/grepping/otherwise manually analysing. Do you know some (preferably locally runnable) tools that could help me build logs processing pipeline or just analyze them more efficiently by: • marking/annotating some well known keywords/patterns • parse data out of patterns • augment log lines with data from other files
I'm starting to wonder if there is anything better/easier than Kibana/Opensearch Dashboard combined with a Python (or some other) data ingestion script
m
Depending on the size of those logs asking LLM to summarize them could be a good starting point. Creating simple python script that pushes that to even locally running open model should get you pretty good results.
k
never went with anything LLM nor AI related, got any ELI5 articles on getting started/learning what to expect of it? The logs are not big, few MB at most per the bundle, at the same time I'm quite confident I need an efficient way to repeatedly tag/parse those more than LLM summarizer (those don't make much sense without the code or context as those are pretty low quality debug messages)
some of customers have clients connecting to tens/hundreds of other clients so it's simply too much noise to make sense of it without carving out info about the specific connections and tagging properly
m
Check on ollama + python integration. You should get a lot of results for that search query.
k
if I could feed it with a (growing) list of known patterns combined with a local client states then maybe something would come out of it indeed
thanks for the idea
m
you can tell it the patterns as a part of context building before making actual query with the log files.
k
am i expected to train it on something, or could it be working fine from the start with context provided ahead of the query?
m
no need to train, it should be enough with just the context. Training would make it use less resources when executing I think, but in the same time cost more resources on the training itself upfront
I would start with just the context ahead of query and explore more if unsatisfied.