Monitoring streaming data with Kafka and whylogs

Photo by Chris Liverani on Unsplash

The importance of data in making informed decisions is now universally agreed upon for nearly every application. This also prompts the need for tools that enable us to make use of this data in a sensible and efficient way.

In this article, I’d like to share an approach on how to leverage streaming data by setting up a monitoring dashboard by logging statistical profiles of the streamed data. To do so, I’ll use an underwater Remote Operated Vehicle (ROV) as a use case. More specifically, we’ll be monitoring for faults of an OpenROV v2.8 from the late OpenROV (now Sofar)…

Hands-on Tutorials

With Google Data Studio, lakeFS and Great Expectations

Photo by Nathan Dumlao on Unsplash

Like everything in life, machine learning models go stale. In a world of ever-changing, non-stationary data, everyone needs to go back to school and recycle itself once in a while, and your model is no different.

Well, we know that retraining our model is important, but when exactly should we do it? If we do it too frequently, we’d end up wasting valuable time and effort, but to do it seldomly would surely affect our prediction’s quality. Unfortunately, there is no one-size-fits-all answer. Each case should be carefully assessed in order to determine the impact of staleness.

In this article…

While taking technical debt into consideration

Photo by Teslariu Mihai on Unsplash

Your ML Model has graduated. Now it needs a job.

After you’ve properly trained and validated your ML Model, it is time for it to do what it was made for: to serve. One common way to do that is to deploy it as a REST API. In this article, I want to share how I did that on my personal project of building a simple web application for fake news detection.

But also, I’d like to take this opportunity to discuss some aspects of a very important matter: technical debt.

Nowadays, it’s relatively fast to deploy ML systems, but it is very easy to overlook how difficult and expensive…

Getting Started

Keep your machine learning projects under control

Photo by Annie Spratt on Unsplash

To track and reproduce

From my personal experience, one thing I realized is that tracking machine learning experiments is important. This realization was eventually followed by another one: tracking machine learning experiments is hard.

Consider these situations:

  • You are tuning your model. During the process, you find an error in your training pipeline. Or maybe you get hold of a bigger, improved input dataset. You can’t compare apples to oranges, and so you have to repeat all your previous experiments. …

Photo by NordWood Themes on Unsplash

Looking for an excuse to learn a bit more about web scraping and Google Data Studio, I decided to begin a project based on my wife’s commercial Instagram profile. The goal was to build an online updatable dashboard with some useful metrics, like top hashtags, frequently used words, and posts distribution per weekday:

Felipe de Pontes Adachi

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store