Multi-Label Classification and Regression
Third in a series on understanding FastAI.
Objective:
In this part, we will be looking on the other types of compiter vision problems, multi-label classification and regression. The first one is when you want to predict more than one label per image and the later occurs when our labels are one or several numbers - quantities in stead of categories.
First, we will use PASCAL dataset which is famous of having more than one kind if classified object per image.
from fastai.vision.all import *
path= untar_data(URLs.PASCAL_2007)
df = pd.read_csv(path/'train.csv')
df.head()
Constructing a DataBlock
Ultil now, we should see the differences between a Dataset
and DataLoader
.
- Dataset:: is a collection which returns a tuple of independant and dependant variable for a single item
- DataLoader:: is an iterator which provides a stream of mini-batches, where each mini-batch is a couple of a batch of independant and a batch of dependant variables.
By using DataBlock
, we will create our datasets and dataloader from scratch.
dblock=DataBlock()
dsets = dblock.datasets(df) # datablock create datasets which contains training set and validation set
len(dsets.train), len(dsets.valid)
Let's grab the dependant variable and independant variable
x,y = dsets.train[0]
x,y
x['fname'],y['labels']
dblock = DataBlock(get_x=lambda r: r['fname'], get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[1]
Let's work with the complete parh of inputs
To open the path (independant variable) as an image, we will need a conversion for each of the thing in the tuple. we will need ImageBlock
to open image and a specilized block to open the category, e.g MultiCategoryBlock
dblock = DataBlock(blocks=(ImageBlock,MultiCategoryBlock),get_x=get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
The long list of the output category is the label which encoded as one hot encoding .
idxs = torch.where(dsets.train[0][1]==1.)[0]
dsets.train.vocab[idxs]
To separate validate items and training items, we use splitter
def splitter(df):
train = df.index[~df['is_valid']].tolist()
valid = df.index[df['is_valid']].tolist()
return train,valid
dblock = DataBlock(blocks=(ImageBlock,MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y)
dsets = dblock.datasets(df)
dsets.valid[0]
dblock = DataBlock(blocks=(ImageBlock,MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y,
item_tfms=RandomResizedCrop(128,min_scale=0.35))
dls=dblock.dataloaders(df)
dls.show_batch(nrows=1,ncols=3)
Now, our data is ready to be trained
We will create our Learner
. Basically, there are 4 main things in the Learner:
- a
DataLoader
object - a model
- an
Optimizer
- a loss function
Then we will grab one batch of data and put it into independant and dependant variable. Then, we will pass the independant variable to our learning model and it will return the activation of the last layer. The activation at the last layer has size of (64,20) corresponding to the batch size(64) and output categories (20). Then, the objective is to calculate the probabilities of each of 20 categories.
learn=cnn_learner(dls,resnet18)
learn.model.cuda() # model moved to CUDA
x,y = dls.train.one_batch()
activs = learn.model(x)
activs.shape
Naturally, the output of the last layer have not been normalized yet since the output of each categories is not within 0 and 1. Then, we need to scale it as mnist loss with the addition of log
into it:
def binary_cross_entropy(inputs,targets):
inputs = inputs.sigmoid()
return torch.where(targets==1, 1-inputs, inputs).log().mean()
In pytorch, we can use it directly with F.binary_cross_entropy
, and its module equivalent nn.BCELoss
. which calculate cross entropy on a one-hot encoded target, but do not include initial sigmoid
. To include it, we will want to use F.binary_cross_entropy_with_logits
(or nn.BCELossWithLogitsLoss
) which do both sigmoid and binary cross entropy in a single function.
We should note that we can not apply softmax
and nll_loss
because we might need to find multiple categories in a single image, so we can not restrict the sum of all activations to 1.
loss_func = nn.BCEWithLogitsLoss()
loss = loss_func(activs,y)
loss
Then, we need to build a metric which is accuracy to apply for multilabel problem. Previously, we built the accuracy for a single label which returns an argmax with highest probability of existance. It will not work in case of multi-label classification because we have more than one prediction on a single image.
So, the idea is to compare our activation with a certain threshold! Picking a good threshold is important, if we pick a threshold is too low, we will be failling to select correctly labeled object generally.
def accuracy_multi(inp,targ, thres=0.5, sigmoid=True):
if sigmoid: inp = inp.sigmoid()
return((inp>thres)==targ.bool()).float().mean()
learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi,thres=0.2))
learn.fine_tune(3,base_lr=3e-3, freeze_epochs=4)
In order to know which is the right value of threshold to pick, we will try several levels and see what works best.
preds,targs = learn.get_preds()
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds,targs,thres=i,sigmoid=False) for i in xs]
plt.plot(xs,accs)
Practically, we have used validation set to train the hyperparameters.
Different from classification where the set of dependant variables are set of categories, dependant variables in regression problem are continuous number, for instance, we can predict product purchases from given images, texts and tabular data.
As an example of regression, in the following, we will make a prediction of the facial posision in the images. We will use biwi headpose dataset for this.
path.ls().sorted()
There are 24 directories numbered from 01 to 24 (different persons photographed) and a corresponding .obj file. Let's look inside one of these directories
(path/'01').ls().sorted()
Inside each sub-directory, we have different frames, each of them come with an image (\_rgb.jpg
) and a pose file (\_pose.txt
). We can write a function that turns each image into a pose file.
img_files = get_image_files(path)
def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt' )
img2pose(img_files[0])
im = PILImage.create(img_files[0])
To show the center of the head in each image, we will have a function that does that.
cal = np.genfromtxt(path/'01'/'rgb.cal',skip_footer=6)
def get_ctr(f):
ctr=np.genfromtxt(img2pose(f),skip_header=3)
c1 = ctr[0]*cal[0][0]/ctr[2] + cal[0][2]
c2 = ctr[1]*cal[1][1]/ctr[2] + cal[1][2]
return tensor([c1,c2])
get_ctr(img_files[0])
We can pass this function to DataBlock
as get_y
, since it is responsible for lebelling each item.
biwi = DataBlock( blocks=(ImageBlock,PointBlock),
get_items=get_image_files,
get_y=get_ctr,
splitter=FuncSplitter(lambda o: o.parent.name=='13'),
batch_tfms=[*aug_transforms(size=(240,320)),
Normalize.from_stats(*imagenet_stats)])
dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))
Let's do the training using cnn_trainer, y_range
is implemented in fastai using sigmoid_range
and it tells us which range of dependant variable we expected to see.
learn = cnn_learner(dls,resnet18,y_range=(-1,1))
By default, MSELoss is chosen for the loss function since it examines how close we are with the target.
dls.loss_func
Let's pick a good learning rate
learn.lr_find()
Then, we will try a learning rate of 0.5e-2
lr=0.5e-2
learn.fine_tune(3,lr)
Amazingly, the obtained loss is around 0.000123 and it seems terrifically accurate. To show the initial targets and resulted predictions, we will show several outcomes:
learn.show_results(ds_idx=1,max_n=3,figsize=(6,8))
So, we can build a really good regression model with using transfer learning and flexible API!