Photo by Kendal on Unsplash

In this article, I will be taking you through some of the practical considerations and challenges of using AI in Medical diagnosis. The criticality involved in this domain makes it important to pay heed to these challenges apart from the core AI/ML practices.

Dataset considerations


a) Resampling data (Oversampling, Undersampling, etc.)

b) Modifying loss function i.e. using weighted loss to incorporate the effect of imbalanced classes.

2. Small training dataset

Image Source

Transfer learning implies adapting a network trained for one problem to a different problem. It is common to pre-train a CNN on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use that either as an initialization or a fixed feature extractor for the task of interest.

Why use Transfer Learning?

  • It is relatively rare to have a dataset of sufficient size. With transfer learning, we can build good classifiers with a few hundred images.

Types of Transfer Learning:

In this post, we will discuss ‘Broadcasting’ using NumPy. It is also used while implementing neural networks as these operations are memory and computationally efficient.

So let’s understand what Broadcasting means followed by a few examples!

Broadcasting describes the way numpy treats arrays with different shapes for arithmetic operations. The smaller array is broadcasted across the larger array so that they have compatible shapes. It provides a way to vectorize array operations thus leading to efficient implementations.

Broadcasting Examples- Image Source

The light bordered boxes represent the broadcasted values, this extra memory is not actually allocated during the operation, but can be useful conceptually…

Using Python’s scipy library for implementation

In this post, I will discuss about Convolutions and how they act as image filters by implementing convolution operation using a few edge detection kernels.

So let’s start!

What are Convolutions?

Convolutions are mathematical operation on two functions that produces a third function that expresses how one is modified by the other.

To give an example, the first function can be the image and the second function is a matrix sliding over the image(kernel) that results in transforming the input image. …

Implementation using Keras

In this post, we will cover the differences between a Fully connected neural network and a Convolutional neural network. We will focus on understanding the differences in terms of the model architecture and results obtained on the MNIST dataset.

Fully connected neural network

  • The major advantage of fully connected networks is that they are “structure agnostic” i.e. there are no special assumptions needed to be made about the input.
  • While being structure agnostic makes fully connected networks…

Image Source

While many MOOCs educate starting from the basics of machine learning techniques to deep learning algorithms using toy datasets that are pretty clean and small to work with, but in reality, while working with real data, 50–60 % of the time is spent in making it proper to be used for any analysis.

Quality of data in terms of coverage, completeness, and correctness plays a crucial role in the success of data science projects by helping businesses providing the right insights!

With this intuition in mind, I thought of writing what all data quality checks can be performed based on…

Building a Sentiment Classifier

In continuation of my previous blogs, part-1 and part-2 where we explored COVID tweet data and performed topic modeling respectively, in this part, we will build a sentiment classifier.

Most frequently occurring words in positive and negative tweets.

Although basic data exploration has been done in previous parts, showing a little glimpse of data again!!

A) Preview of the dataset used.

Finding latent topics using Topic Modelling

In continuation of part-1, where we explored twitter data related to COVID. In this post, we will use Topic Modelling to get to know more about the underlying key ideas that people are tweeting about.

Let’s first understand what Topic Modelling is!

Topic Modelling

Topic Modelling is an unsupervised technique which helps to find underlying topics also termed as latent topics, present in a plethora of documents available.

In real-world, we observe a lot of unlabelled text data, in form of comments, reviews or complaints, etc. …

Exploring COVID Tweet data

In this blog, I have taken the COVID tweet dataset from Kaggle and explored it to understand what people are talking about using NLP techniques.

So let’s understand the data about new normal beginnings!!

Dataset used

I have taken a dataset “Corona Virus Tagged Data” from Kaggle. The tweets have been pulled from Twitter and manual tagging has been done. The names and usernames have been given codes to avoid any privacy concerns. There are two datasets available — train.csv and test.csv. I have used train.csv for this exploratory analysis.

Columns present:-

  • UserName
  • ScreenName
  • Tweet At
  • Original Tweet
  • Label

Extracting named entities

In this post, I have discussed what we mean by a named entity, name entity recognition technique, and how to extract named entities using spaCy.

Named Entity

The term “named entity” is traditionally used to refer to the set of person, organization, and location names encountered in a given text. Further dates, monetary units or percentages, etc. are often included and detected using the same techniques, based on local grammars.

Example:- “Facebook bought WhatsApp in 2014 for $16bn”

Pooja Mahajan

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store