Do You Need the 2021 MacBook Pro for Data Science?



Original Source Here

Do You Need the 2021 MacBook Pro for Data Science?

Insights from a person using both an M1 and Intel Mac daily

Hello there friends! On October 18, Apple finally unveiled their refreshed MacBook Pro line, and wow do those devices look gorgeous! It’s no secret that I’m a big fan of Apple, and while I’d love to be able to own all their devices, I — like you — have to be conscientious on which devices I choose to own and how often I upgrade them. One of the huge things Apple touted in today’s keynote was the astounding power of the new M1 Pro and M1 Max chipsets, and you might be wondering, is it worth me upgrading to this new hardware for my data science work?

Obviously, I haven’t used those devices myself, but I can provide insight from a perspective of somebody who daily uses both an M1 Mac and an Intel Mac. More specifically, I use both an M1 Mac mini and M1 iPad Pro for my personal data science work, and for my regular job as machine learning engineer, my employer has provided me with the latest 16″ Intel MacBook Pro. So even though I don’t have hands on experience with these new chipsets in the 2021 MacBook Pros, I would say my daily experience across these older chipsets probably provides a clear enough verdict for most people.

Apple is touting some great performance out of these new chipsets, and while I have no reason to doubt that the performance is great, is it something you need to have? Before we get into that question, I think there’s a bigger elephant in the room to talk about: Apple’s custom silicon.

New Custom Apple Silicon

For well over a decade now, Apple has been powering the MacBook line with chips supplied by Intel. In tandem with its excellent — if closed — operating system, macOS, the Intel Macs were always lauded well when it came to performance. Starting in 2010, Apple started to make its own silicon for the iPhone and iPad lines, the A-series line which still exists today. Each year the A-series made great strides in progress, which left many with the question: will Apple ever try their hand at making their own silicon chip for the Mac line?

Cut to 2020, and Apple did just that in the form of the newest M1 chip. This newest chip heralded the end of an era for Intel (which Intel is none too happy about), and while today saw the introduction of the M1 Pro / Max chip for the larger MacBook Pros, the M1 chip in general has been around about a year now.

The trouble is that chipsets are designed with their own architectural needs in mind. Intel chips use an architecture called x86, and most software programs and coding languages are designed to work with this x86 architecture. Apple has chosen to do, well, what Apple chooses to do: create their own custom chipset architecture, dubbed by the community simply as the Apple Silicon architecture. Now, Apple has promised via a compatibility measure called Rosetta 2 that any software designed to work on x86 will work without any problem on their Apple Silicon architecture. For the most part, I can say that’s true.

But of course… it’s not wholly true. At least not yet. As of when this post goes live, I have had issues here in there while not unsolvable gave me a bit of a headache as a machine learning engineer. (And thank goodness my Intel Mac had none of these problems!) Here are a few of the issues I’ve seen with my M1 Mac:

  • Pandas on Docker Desktop: By default, Apple uses Rosetta 2 behind the scenes so that it continues to make x86-compatible commands work when run in the Terminal. This is required to install Pandas (and other similar libraries) to a machine today. Running a pip install pandas should work just fine directly in your M1’s Terminal, but that isn’t the case using Docker Desktop. If you try to run a docker build command to build an image that tries to install Pandas, I believe it tries to use the Apple Silicon architecture directly and fails. I’m not 100% that is the root cause, but I spent a day looking for a workaround to install Pandas on my Docker images.
  • Anaconda: Anaconda currently does not natively support the Apple Silicon architecture. This honestly isn’t a big deal right now since the good folks behind the Conda-Forge group have created a workaround, but who knows if any new M1 users were able to use Anaconda on day 1? I personally got my M1 Mac mini several months after launch, but I imagine that Day 1 users were out of luck here.
  • eGPU Support: The current M1 Macs do not support eGPUs, and I would guess that the newer M1 Pro and Max chips probably do not either. This personally isn’t a deal breaker for me, and I’ll explain why in the next section.

I could keep going on, but putting my optimism hat on, I think a lot of these issues will work themselves out over time. The M1 chip has been around for only a year and in a limited subset of the Mac market, so I think a lot of this stuff will turn around. I bet Docker Desktop will get patched to build my ML images appropriately, and even if Anaconda never releases an official native update, Conda-Forge’s solution works great.

But you might be wondering, what about that eGPU? And what about the promise of all that extra power from these M1 Pro and Max chips? Great questions friends. Let’s address that here in the next section.

What About That Extra Compute Power?

For the data science work that I’ve done on my M1 Mac mini, I’ve definitely noticed a speed increase. Specifically, you all might be aware that performing a Grid Search to find optimal model hyperparameters can take quite a long time. This is because the algorithm is basically being put through the ringer with every possible combination of hyperparameters supplied to the computer’s CPU. When running a Grid Search job on my former early 2019, 13″ MacBook Pro on a Titanic dataset with modest parameters, it would take about one minute to complete. When running the exact same Grid Search job on my M1 Mac mini, it completed in about 8 seconds.

Astounding, right?

It is, except consider that I was doing work on my own personal machine with a public Kaggle dataset that is free and available to anybody. I’m able to keep this dataset on my Mac mini with no privacy or security concerns.

This is not how data science works at most companies.

Data security and privacy has been a hot button concern especially in the last few years, and nothing screams “VIOLATION!” more than having company data on a person’s laptop. Most companies have figured ways around this through creative means. For example, you can have a managed version of JupyterHub on a Kubernetes cluster that is secured with proper credential access, which allows a data scientist to do their work without downloading anything to their laptops. Moreover, many companies are going to the cloud and using native services like AWS SageMaker for people to do their data science work.

In both scenarios, the compute power of the local workstation goes unused. In other words, most data scientists would never even come close to using the performance potential of the M1 chips. The compute happens in another environment closer to where the data resides. At that point, an M1 Mac is basically a fancy window into a place where the “real work” is happening. So while I recognize that the M1 Pro and Max chips probably do offer amazing performance, the data security concerns of the company you work for would probably nullify any chance of you leveraging that performance.

(Side note: I’ve worked enough with Windows machines to know that I would still recommend MacBooks for data scientists due to its roots in Linux, but I couldn’t blame you if you wanted to save by giving your employees MacBook Airs instead of MacBook Pros.)

Final Thoughts

Reitering the question posed at the start, do you need the 2021 MacBook Pro for data science? No.

But “need” is such a strong word, and the only thing we talked about in this post was very narrowly scoped to data science. There are many more factors you have to consider, including things like…

  • Has it been a while since I’ve upgraded my Mac?
  • Do I also do things like video editing that would make really nice use of those M1 Pro and Max chips?
  • Do I just think it’s pretty and super cool?

If you answered yes to any of those questions, then by all means, go for it! While I personally am good with my M1 Mac Mini / M1 iPad Pro combo, this newest generation of MacBook Pros look really cool, and I’m excited for all of you who will be picking one up!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot



via WordPress https://ramseyelbasheer.io/2021/10/20/do-you-need-the-2021-macbook-pro-for-data-science/

Popular posts from this blog

I’m Sorry! Evernote Has A New ‘Home’ Now

Jensen Huang: Racism is one flywheel we must stop

5 Best Machine Learning Books for ML Beginners