In this article, we will discuss comparing our model accuracy with human-level performance and discuss the concepts like avoidable bias and how to tackle it!

Image Source

What is avoidable bias?

The difference between human error (approximation of Bayes error) and the training error is termed avoidable bias.

The perfect level of accuracy may not be always 100% and Bayes optimal error is the very best theoretical function that can never be surpassed(best possible error).

Comparison with human-level performance

As long as our model is doing worse than humans we can use some tactics for improving our model. Although knowing about bias and variance helps and it turns out that…


Photo by Kendal on Unsplash

In this article, I will be taking you through some of the practical considerations and challenges of using AI in Medical diagnosis. The criticality involved in this domain makes it important to pay heed to these challenges apart from the core AI/ML practices.

Dataset considerations

  1. Imbalanced dataset —The presence of a higher number of data samples without disease rather than with disease results in imbalanced datasets and thus poses issues in training algorithms for medical analysis.

Solution:

a) Resampling data (Oversampling, Undersampling, etc.)

b) Modifying loss function i.e. using weighted loss to incorporate the effect of imbalanced classes.

2. Small training dataset


Image Source

Transfer learning implies adapting a network trained for one problem to a different problem. It is common to pre-train a CNN on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use that either as an initialization or a fixed feature extractor for the task of interest.

Why use Transfer Learning?

  • Using large networks that were trained with vast datasets for our new tasks reduces time and computation requirements.
  • It is relatively rare to have a dataset of sufficient size. With transfer learning, we can build good classifiers with a few hundred images.

Types of Transfer Learning:

  • Finetuning — Starting with…


In this post, we will discuss ‘Broadcasting’ using NumPy. It is also used while implementing neural networks as these operations are memory and computationally efficient.

So let’s understand what Broadcasting means followed by a few examples!

Broadcasting describes the way numpy treats arrays with different shapes for arithmetic operations. The smaller array is broadcasted across the larger array so that they have compatible shapes. It provides a way to vectorize array operations thus leading to efficient implementations.

Broadcasting Examples- Image Source

The light bordered boxes represent the broadcasted values, this extra memory is not actually allocated during the operation, but can be useful conceptually…


Using Python’s scipy library for implementation

In this post, I will discuss about Convolutions and how they act as image filters by implementing convolution operation using a few edge detection kernels.

So let’s start!

What are Convolutions?

Convolutions are mathematical operation on two functions that produces a third function that expresses how one is modified by the other.

To give an example, the first function can be the image and the second function is a matrix sliding over the image(kernel) that results in transforming the input image. …


Implementation using Keras

In this post, we will cover the differences between a Fully connected neural network and a Convolutional neural network. We will focus on understanding the differences in terms of the model architecture and results obtained on the MNIST dataset.

Fully connected neural network

  • A fully connected neural network consists of a series of fully connected layers that connect every neuron in one layer to every neuron in the other layer.
  • The major advantage of fully connected networks is that they are “structure agnostic” i.e. there are no special assumptions needed to be made about the input.
  • While being structure agnostic makes fully connected networks…


Image Source

While many MOOCs educate starting from the basics of machine learning techniques to deep learning algorithms using toy datasets that are pretty clean and small to work with, but in reality, while working with real data, 50–60 % of the time is spent in making it proper to be used for any analysis.

Quality of data in terms of coverage, completeness, and correctness plays a crucial role in the success of data science projects by helping businesses providing the right insights!

With this intuition in mind, I thought of writing what all data quality checks can be performed based on…


Building a Sentiment Classifier

In continuation of my previous blogs, part-1 and part-2 where we explored COVID tweet data and performed topic modeling respectively, in this part, we will build a sentiment classifier.

Most frequently occurring words in positive and negative tweets.

Although basic data exploration has been done in previous parts, showing a little glimpse of data again!!

A) Preview of the dataset used.

  • A glimpse of the COVID Tweet dataset


Finding latent topics using Topic Modelling

In continuation of part-1, where we explored twitter data related to COVID. In this post, we will use Topic Modelling to get to know more about the underlying key ideas that people are tweeting about.

Let’s first understand what Topic Modelling is!

Topic Modelling

Topic Modelling is an unsupervised technique which helps to find underlying topics also termed as latent topics, present in a plethora of documents available.

In real-world, we observe a lot of unlabelled text data, in form of comments, reviews or complaints, etc. …


Exploring COVID Tweet data

In this blog, I have taken the COVID tweet dataset from Kaggle and explored it to understand what people are talking about using NLP techniques.

So let’s understand the data about new normal beginnings!!

Dataset used

I have taken a dataset “Corona Virus Tagged Data” from Kaggle. The tweets have been pulled from Twitter and manual tagging has been done. The names and usernames have been given codes to avoid any privacy concerns. There are two datasets available — train.csv and test.csv. I have used train.csv for this exploratory analysis.

Columns present:-

  • UserName
  • ScreenName
  • Tweet At
  • Original Tweet
  • Label

Pooja Mahajan

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store