3 Beginner Mistakes I’ve Made in My Data Science Career



Original Source Here

1. Believing Complex Algorithms Always Result in Better Solutions

“So what are the characteristics of these clustered residents?” my manager asked.

We had used the most advanced, recently released model to segment the residents of a smart city. The whole model was a black box, so we have no idea how it does the segmentation but gave the highest accurate clusters.

I thought for a minute; I couldn’t come up with an answer. Our model had no interpretability.

I hadn’t learned the lesson, though. At a later time, the client turned down our proof of concept for a potential project.

“This solution looks promising but let us get back to you. The investment of deploying this solution might be a bit too high.”

We had proposed a computer vision system to estimate the mass of fishes, using state-of-the-art object detection and depth estimation models. Still, we hadn’t accounted for the expensive GPU-based computation that came along with that.

Whenever I have been presented with a problem to solve, my brain is used to thinking of neural networks and complex algorithms.

Computer vision? Convolutional Neural Networks. NLP? Transformers. Synthetic data generation? GANs. Tabular data? XGBoost. In a nutshell, I’d opt for the most complex algorithm out there because I believed, more the complexity, the better the solution.

To some extent, this idea is true, especially when you want to win Kaggle competitions. That’s how these algorithms got popular in the first place.

Here’s the twist: In the real world, a 2% improvement in accuracy need not be as significant as it is in hackathons and competitions. The interpretability of the solutions, the operational cost of the solutions matter much more.

How to avoid this:

I have only one piece of advice to give you, which has worked for me ever since I’ve understood how the real world and businesses work. This trick has made our clients the happiest and my life the easiest.

Are you ready for it?

Start simple.

You heard me. Start with simple machine learning algorithms. There’s no point in complicating things upfront. Start with simpler solutions, which are more interpretable and are cost-efficient. There’s no harm in experimenting with linear or logistic regression.

Christoph Molnar has written a gem of a book on how to use interpretable machine learning techniques. Always a good idea to keep you educated on these topics.

If the performance is satisfactory, you’re good to go. If not, level up to a slightly complex one while accounting for the trade-off in interpretability and operational costs. This way, everyone’s expectations are bound to meet 100% of the time. Win-win.

You still with me? (Photo by JJ Jordan on Unsplash)

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot



via WordPress https://ramseyelbasheer.io/2021/04/30/3-beginner-mistakes-ive-made-in-my-data-science-career/

Popular posts from this blog

I’m Sorry! Evernote Has A New ‘Home’ Now

Jensen Huang: Racism is one flywheel we must stop

Streamlit — Deploy your app in just a few minutes