In this article, we will discuss comparing our model accuracy with human-level performance and discuss the concepts like avoidable bias and how to tackle it!

The difference between human error (approximation of Bayes error) and the training error is termed avoidable bias.

The perfect level of accuracy may not be always 100% and Bayes optimal error is the very best theoretical function that can never be surpassed(best possible error).

As long as our model is doing worse than humans we can use some tactics for improving our model. Although knowing about bias and variance helps and it turns out that…

In this article, I will be taking you through some of the practical considerations and challenges of using AI in Medical diagnosis. The criticality involved in this domain makes it important to pay heed to these challenges apart from the core AI/ML practices.

**Imbalanced dataset**—The presence of a higher number of data samples**without disease**rather than**with disease**results in imbalanced datasets and thus poses issues in training algorithms for medical analysis.

**Solution:**

a) Resampling data (Oversampling, Undersampling, etc.)

b) Modifying loss function i.e. using weighted loss to incorporate the effect of imbalanced classes.

2.** Small training dataset**…

Transfer learning implies adapting a network trained for one problem to a different problem. It is common to pre-train a CNN on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use that either as an initialization or a fixed feature extractor for the task of interest.

- Using large networks that were trained with vast datasets for our new tasks reduces time and computation requirements.
- It is relatively rare to have a dataset of sufficient size. With transfer learning, we can build good classifiers with a few hundred images.

**Finetuning —**Starting with…

In this post, we will discuss ‘Broadcasting’ using NumPy. It is also used while implementing neural networks as these operations are memory and computationally efficient.

So let’s understand what Broadcasting means followed by a few examples!

Broadcasting describes the way numpy treats arrays with different shapes for arithmetic operations. The smaller array is broadcasted across the larger array so that they have compatible shapes. It provides a way to vectorize array operations thus leading to efficient implementations.

The light bordered boxes represent the broadcasted values, this extra memory is not actually allocated during the operation, but can be useful conceptually…

In this post, I will discuss about Convolutions and how they act as image filters by implementing convolution operation using a few edge detection kernels.

So let’s start!

Convolutions are mathematical operation on two functions that produces a third function that expresses how one is modified by the other.

To give an example, the first function can be the image and the second function is a matrix sliding over the image(kernel) that results in transforming the input image. …

In this post, we will cover the differences between a Fully connected neural network and a Convolutional neural network. We will focus on understanding the differences in terms of the model architecture and results obtained on the MNIST dataset.

- A fully
- The major advantage of fully connected networks is that they are “structure agnostic” i.e. there are no special assumptions needed to be made about the input.
- While being structure agnostic makes fully connected networks…

While many MOOCs educate starting from the basics of machine learning techniques to deep learning algorithms using toy datasets that are pretty clean and small to work with, but in reality, while working with real data, 50–60 % of the time is spent in making it proper to be used for any analysis.

Quality of data in terms of coverage, completeness, and correctness plays a crucial role in the success of data science projects by helping businesses providing the right insights!

With this intuition in mind, I thought of writing what all data quality checks can be performed based on…

In continuation of my previous blogs, part-1 and part-2 where we explored COVID tweet data and performed topic modeling respectively, in this part, we will build a sentiment classifier.

Although basic data exploration has been done in previous parts, showing a little glimpse of data again!!

- A glimpse of the COVID Tweet dataset

In continuation of part-1, where we explored twitter data related to COVID. In this post, we will use Topic Modelling to get to know more about the underlying key ideas that people are tweeting about.

Let’s first understand what Topic Modelling is!

Topic Modelling is an unsupervised technique which helps to find underlying topics also termed as latent topics, present in a plethora of documents available.

In real-world, we observe a lot of unlabelled text data, in form of comments, reviews or complaints, etc. …

In this blog, I have taken the COVID tweet dataset from Kaggle and explored it to understand what people are talking about using NLP techniques.

So let’s understand the data about new normal beginnings!!

I have taken a dataset “Corona Virus Tagged Data” from Kaggle. The tweets have been pulled from Twitter and manual tagging has been done. The names and usernames have been given codes to avoid any privacy concerns. There are two datasets available — train.csv and test.csv. I have used train.csv for this exploratory analysis.

**Columns present**:-

- UserName
- ScreenName
- Tweet At
- Original Tweet
- Label

Data Scientist. LinkedIn — https://www.linkedin.com/in/pooja-mahajan-69b38a98/.