Data Science Job Market Trend Analysis for 2021
Original Source Here
Are you preparing for a data science job interview in 2021? We have analyzed the hiring trends from more than 3000+ data science job postings across several online career portals. Hopefully, these insights will help you get ready for an interview by analyzing the expectations of employers and the overall market demand.
Data science and machine learning opportunities in the US are getting better every year. Companies across industries and functions (IT, marketing, consulting, etc.) have begun ramping up their use and need for data scientists when it comes to corporate demand for data scientists. In fact, according to the recent Job Outlook report from the US Bureau of Labor Statistics, corporate demand is expected to grow enormously in the upcoming decade.
As a data scientist, you can expect to be well-compensated for your skills. In an effort to understand the role today and what corporate demand will look like in the future, We have conducted our own research into the role of data scientists, as well as a deep dive into job portals to find out exactly what US startups and corporations are looking for in candidates.
To analyze current trends and understand their significance, insights, and market demands, we have stepped forward and try to enlighten some interesting inferences for prospective job seekers. The main aim behind this analysis is to help job seekers and career transitioners better understand the current market’s needs for data scientists and machine learning practitioners.
The following data analysis will give us an overview of the:
- Top Companies in the US Actively Recruiting Data Scientists 🌃
- Top Locations Hiring Data Scientists in the US 🗺️
- Level of Experience Desired for Data Scientists in the US 📊
- Most in-demand Job Roles Offered by the Top Companies Hiring Data Scientists in the US ❗️
- The Trend of Positions within Different Groups of Experience Level
- Top 15 In-demand Skills for Data Scientists in the US 📚
- Top Programming Languages for Data Scientist Job Postings in the US
- Top Data Visualization Tools for Data Science Job Postings in the US
- Top Deep Learning Frameworks for Data Science Job Postings in the US
- Top Big Data Technologies for Data Scientist Job Postings in the US
- Top Web Frameworks for Data Scientist Job Postings in the US
- Final Thoughts
To kickstart the analysis, we needed the most recent and accurate data. Therefore, the best option seems to be web scraping some of the popular job portals in the US.
Web Scraping
Selenium is one of the fastest, affordable, and reliable ways to extract relevant information. This data analysis project uses Selenium to scrape job portals websites. Importing necessary packages and setting up a chrome driver path is pretty much straightforward. Hitting 50 pages in a loop wherein each page contains a brief description of 20 job postings, making 3000+ job postings to analyze the data.
Foremost, extracting the URL of each job posting will lead us to a detailed page of the respective job posting wherein all the details needed to make inferences are present.
import pandas as pd
import numpy as np
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementExceptionchromepath = r'D:\Drivers\Chrome Driver\chromedriver.exe'url_list = []for i in range(1, 50):
print('Opening Search Pages ' + str(i))
page_url = 'https://jobportalexample.com/data-scientist-jobs-'+str(i)
driver = webdriver.Chrome(chromepath)
driver.get(page_url)
print('Accessing Webpage OK \n')
url_elt = driver.find_elements_by_class_name("fw500")
print('Success')
for j in url_elt:
url = j.get_attribute("href")
url_list.append(url)
driver.close()
To ease the process, the URLs are saved as a pandas DataFrame.
url_list_copy_cleaned = [i for i in url_list]
out_company_df = pd.DataFrame(url_list_copy_cleaned, columns=['Website'])
out_company_df.head()
Now that the variable `url_list_copy_cleaned` has the URLs of 3,000+ job listings. The next step is to hit all of the 1,000 pages and extract the details.
The elements that will be scraped are:
✔️Companies
✔️Locations
✔️Experience
✔️Roles
✔️Skills
jobs={'roles':[],
'companies':[],
'locations':[],
'experience':[],
'skills':[]}
driver = webdriver.Chrome(chromepath)for url in out_company_df['Website']:
driver.get(url)
try:
name_anchor = driver.find_element_by_class_name('pad-rt-8')
name = name_anchor.text
jobs['companies'].append(name)
except NoSuchElementException:
jobs['companies'].append(np.nan)
try:
role_anchor = driver.find_element_by_class_name('jd-header-title')
role_name = role_anchor.text
jobs['roles'].append(role_name)
except NoSuchElementException:
jobs['roles'].append(np.nan)
try:
location_anchor = driver.find_element_by_class_name('location')
location_name = location_anchor.text
jobs['locations'].append(location_name)
except NoSuchElementException:
jobs['locations'].append(np.nan)
try:
experience_anchor = driver.find_element_by_class_name('exp')
experience = experience_anchor.text
jobs['experience'].append(experience)
except NoSuchElementException:
jobs['experience'].append(np.nan)
try:
skills_anchor = driver.find_elements_by_class_name("chip")
each_skill = []
for skills in skills_anchor:
each_skill.append(skills.text)
jobs['skills'].append(each_skill)
except NoSuchElementException:
jobs['skills'].append(np.nan)driver.close()
Notice, catching the NoSuchElementException
error is very important since a few URLs will take us directly to the company website rather than just another details page of the same job portal website. In such cases, the HTML element we are looking for might not be present, which will throw an error.
For better data handling and preprocessing, having the data solidified as a Pandas DataFrame is the best option. After all the preprocessing steps, such as dropping null values, splitting columns, tokenizing locations, skills columns, etc., the cleaned dataset is taken to Tableau for best visualization 📈.
If you are unfamiliar with Tableau, it is an American interactive data visualization software company focused on business intelligence[1].
AI/ML
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot
via WordPress https://ramseyelbasheer.io/2021/05/17/data-science-job-market-trend-analysis-for-2021/