Ramsey Elbasheer - Data Science for the EMEA

Posts

Showing posts from January, 2021

Natural Language Processing: tokenization and numericalization

Original Source Here Natural Language Processing: tokenization and numericalization NLP processing techniques NLP nowadays is considered to be one of the most booming fields in Deep Learning, offering more and more possible applications, starting from detecting or generating articles and reviews heading into direction of medical applications like diagnosis recognition, not mentioning extensive business opportunities. How does it work? The basic initial step is converting texts using Tokenization method, which ‘breaks’ raw text in smaller pieces. Tokens can be words, singles characters or subwords (n-gram characters: a contiguous sequence of n items from a given sample of text or speech). The most common way of tokenization is based on space. Taking space as a delimiter for example, you will get from “ Natural Language Processing ” 3 tokens : “ Natural ”, “ Language ”, “ Processing ”. Major techniques for tokenizing are: Split(): split method is used to break the given...

Perplexity …

10:28 PM

Original Source Here … in the context of Natural Language Processing. Perplexity formula What is perplexity? Perplexity is an accuracy measurement of a probability model . A language model is a kind of probability model that measures how likely is a given sentence according to a large corpus of text or the training set (The Wall Street Journal dataset, comments on Youtube in a given language, Brown corpus …). A unigram model (order 1) is an example of language model which gives the probability of a sentence multiplying the probability of each word in the sentence based on their frequency in the training set. A big r am model (order 2) is an other example of language model which gives the probability of a sentence multiplying the probability of each word in the sentence taking into account the previous word (except for the first word) based on the frequency of those pairs (or the first word) in the training set. This can be generalised to order N cases. Having this in m...

Clinical Trial Management

10:28 PM

Original Source Here Clinical Trial Management Clinical Trials word became a buzz word during this pandemic situation. It played a crucial role in developing vaccine to fight the pandemic. Experts from different fields contribute to the development of vaccine which includes (not limited) clinical researchers, health care providers, pharmaceutical industry, data managers, biostatisticians, data scientist and clinical trial programmers. Data collection, management, analysis and reporting also play an important role in helping decision makers in approving and rejecting the vaccine. This book provides an over v iew of clinical trial management process including protocol development, subject recruitment, professionals and organizations involved in clinical trial, data collection, analysis and reporting. It also covers the models related to Clinical Data Interchange Standards Consortium (CDSIC) standards such as Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM). T...

A Bayesian Take On Model Regularization

9:08 PM

https://cdn-images-1.medium.com/max/2600/0*gc4kJc_-vwWAkMzv Original Source Here A Bayesian Take On Model Regularization In this article, we explore how we can, and do, regularize and control the complexity of the models we learn through Bayesian prior beliefs. I’m currently reading “How We Learn” by Stanislas Dehaene. First off, I cannot recommend this book enough to anyone interested in learning, teaching, or AI. One of the main themes of this book is explaining the neurological and psyc h ological bases of why humans are so good at learning things quickly and with great sample-efficiency , i.e. given only a limited amount of experience¹. One of Dehaene’s main arguments of why humans can learn so effectively is because we are able to reduce the complexity of models we formulate of the world. In accordance with the principle of Occam’s Razor ², we find the simplest model possible that explains the data we experience, rather than opting for more complicated models. Bu...

Tokenization and numericalization are the different things to solve the same problem?

8:23 PM

Original Source Here Tokenization and numericalization are the different things to solve the same problem? Introduction Preprocessing is one of the most important tasks that have to done to be able to train the model. Training the model as the process really depends on hardware only as long as you selected a well-optimized transfer learning model while preprocessing along with dataset prep counts for 90% of the work of every Machine Learning engineer. In this article, we are going to learn about two key preprocessing operations in natural language processing Tokenization and Numericalization with a major focus on the former one as this has to deal with how we handle semantic units like words, phrases, and sentences. Disclaimer: The article is prepared with a close relation to the content taught in the fast.ai course, particularly in: Let’s layout a high overview of the Language Model Most Natural Language Processing problems can be solved through transfer learning, where ...

Stochastic Gradient Descent (SGD)

7:23 PM

Original Source Here Implementation import the necessary packages from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn.datasets import make_blobs import matplotlib.pyplot as plt import numpy as np import argparse def sigmoid_activation(x): # compute the sigmoid activation value for a given input return 1.0 / (1 + np.exp(-x)) def sigmoid_deriv(x): # compute the derivative of the sigmoid function ASSUMING # that the input `x` has already been passed through # the sigmoid activation function return x * (1 - x) def predict(X, W): # take the dot product between our features and weight matrix preds = sigmoid_activation(X.dot(W)) # apply a step function to threshold the outputs to binary # class labels preds[preds <= 0.5] = 0 preds[preds > 0] = 1 # return the predictions return preds def next_batch(X, y, batchSize): # loop over our dataset `X` in mini-...

A gaming laptop for Data Science and ML?

4:53 PM

Original Source Here A gaming laptop for Data Science and ML? Gaming laptops now include super-powerful GPUs that could be just perfect for Deep Learning and Data Science. Image source: Future For a long time, gaming laptops were known for being extravagant, expensive, large, noisy, prone to overheating, and expensive. Crafted for a specific market of consumers that value the video game experience more than reality, these powerful machines had really no other use. However, since the time of the first Alienware laptops, a lot has changed. We now find models such as the Asus Zephyrus series, the Razer 15 , and the not-really-a-gaming-laptop Dell XPS 15 , all of which actually deliver a lot of power in compact and somehow aesthetic designs. Did I mention they are expensive? In a somehow unrelated field, Machine learning has become the standard to model large datasets. From business to academia, ML models are constantly proving to be extremely useful and everyone wants to use...

4 techniques of evaluating the performance of deep learning models using validation.

4:08 PM

https://cdn-images-1.medium.com/max/2600/0*i0qXJ3f0EVwtvfIU Original Source Here 4 techniques of evaluating the performance of deep learning models using validation. How to to evaluate the performance of your model during training. Photo by Markus Spiske on Unsplash Validation is a technique in machine learning to evaluate the performance of models during learning. It is done by separating the data set into training and validating sets and then evaluating the performance of the model (deep neural network in this case) on the validation sets. It is important to note that the validation set is quite different from the testing set, the validation set is commonly used in machine learning to evaluate the model’s performance while training the model while the test set is used in evaluating the model’s performance on data it has not seen before. This article mainly covers different techniques used in validating the model. Why should we validate models? This is often a pertin...

Can Machines Learn on Their Own?

4:08 PM

https://cdn-images-1.medium.com/max/2600/0*yJBOBZqejYExYN20 Original Source Here When you a baby, you had to learn a lot of things. How to talk, how to walk, how to use the washroom, and how to not throw a tantrum when you don’t get what you want at the store. All things we have to learn (except some don’t learn the last one, looking at you, Karen). We also spend a large chunk of our lives, just learning things in school. Learning is an integral part of our lives, it helps up understand the world around us and also know what to do when. Now wh e n we think about machines or robots, we think of them as things we have to program every move into. If and else statements everywhere! But what if machines could learn, just like we do? A Brief Explanation of Reinforcement Learning Reinforcement learning is basically how humans learn. There’s an agent (the machine), the agent makes an action (the output) in the environment, and the environment returns a state (inputs) and rewards (pos...

How to Write a Neurips Paper [0]

4:08 PM

Original Source Here How to Write a Neurips Paper [0] I have been planning to write this series for a while now, and the beginning of 2021 is an arbitrari l y auspicious time to start. This series serves two purposes: First, as an account/guide on the process of developing original research for publication, written by someone who has done it a few times. Second, to share some funny stories of my time as a PhD student / postdoc at MIT, and keep it very personal. I remember Philip Guo’s the PhD grind being very informative when I started grad school, and I wanted to make something similar for other PhD students / researchers. However, I will focus specifically on the process of research, a universal, yet at first confusing activity that everyone must go through. Also, I want to tell my story in a more light-hearted tone, as my PhD journey had been, for the most part, fun. AI/ML Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Drive...

Machine Learning For Beginners: Classifying Iris Species

4:08 PM

Original Source Here A First Application: Classifying iris species We will go through a simple machine learning application and create our first model. In the process, we will introduce some core concepts and nomenclature for machine learning. Meet the data The Iris flower data set or Fisher’s Iris data set is a multivariate dat a set. The data set consists of 50 samples from each of three species of Iris ( Iris setosa , Iris virginica and Iris versicolor ). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other. Preview of Data There are 150 observations with 4 features each (sepal length, sepal width, petal length, petal width). There are 50 observations of each species (setosa, versicolor, virginica). It is included in scikit-learn in the dataset module. We can load it by...

AI in Healthcare Data Centers

4:08 PM

Original Source Here AI in Healthcare Data Centers Data volumes and velocities in health care and other industries are growing at incredibly fast rates, driven largely by cloud computing and the proliferation of smart connected devices across the world. This information explosion has created opportunities for organizations to improve processes through analytics, but also significant challenges in terms of their capacity to handle all that information. And while the big data revolution has touched virtually every business unit across the health care spectrum, nowhere has been more affected than the data center. Indeed, this explosion in data generation has put tremendous pressure on data center infrastructure where tens of thousands of events per second (EPS) can occur. There is too much data, in many cases, for humans to keep up. Not only that, but bad actors looking to steal sensitive data have now weaponized artificial intelligence (AI) — leaving chief information securi...

New Applications of Photonics for Artificial Intelligence and Neuromorphic Computing

4:08 PM

https://cdn-images-1.medium.com/max/2600/0*VK3L7ZXsHqDx8gx4 Original Source Here New Applications of Photonics for Artificial Intelligence and Neuromorphic Computing The University of Exeter explores the future potential for computer systems by using photonics in place of conventional electronics. Light bloom in the bottle. Photo by FLY:D on Unsplash Photonics , the science of generating, controlling, and detecting light, is now playing a paramount role in the future of Artificial Intelligence and Neuromorphic Computing . Just in the same way that the 20th century depended on the electron to witness advances in electronics and electricity , the 21st century relies on the photon to propel many scientific breakthroughs in different fields. Photonics play an impo r tant role driving innovation in an increasing number of fields. The application of photonics spreads across several sectors: From optical data communications to imaging, from lighting to displays; from the...