Train an Autoencoder in Fastai, in 5 steps

This is a quick post to understand the coding philosophy of fastai v2.0. I wanted to train an autoencoder on a set of images using fastai library and here’s how it went. (It is easy, only if you know what you are looking at)

... continue reading ...

Tokenizers in Telugu Text

సహస్రనామావళి (range of 1000 names) are commonly associated with Hindu Gods/Goddesses where a Deity’s characterestics are propounded with as many names. I’ve gotten interested in Garikapaati Narasimha Rao’s talks and specifically one called శ్రీ లలితా సహస్రనామావళి (Lalitha sahasranamam). At a junction he explains about the counts of her names grouped by the first character and how a few groups have a lot more names compared to others. I took it as a small task to work with telugu NLP and count these groups

... continue reading ...

Probability vs Density

Probability is a weird child of math and science. One wants to quantify how uncertain one is with given knowledge. But to top it off is the weirder pet in the family - probability density.

“I mean, what?”

-sizhky

Who wants to know how densely uncertain one is? To talk about density in normal science one needs a normalizing factor. Like how heavy is something per m3. Where is that per-ness in a probability density function?

It doesn’t even make sense.

Until it does. And once I understood it, it was brilliant.

... continue reading ...

The Kernel Trick in Neural Networks

Data-science is the science of finding patterns in data. Given lots and lots of points on a 2D plane, they could all be lying on a line, or on an organized pattern (maybe they all form a circle), or there could be clumps of data here and there, or a combination of both (maybe they are all on two circles).

All the machine learning algorithms are specialized to do one or more of these tasks, i.e., finding the line(s) of best fit [such as Linear Regression] or finding the line(s) of best separation [such as Decision trees, Logistic regression], or both [such as SVMs and Neural Networks]

Even more basically, data-science is about finding lines. And linear regression is the easiest to explain. Find the coefficients of that line which has maximum overlap with the points, in case of regression and that line which divides the points most. But there is an obvious assumption in the method. That data is linearly separable. This is not true. Never true, in fact. There will always be that curve which can separate the clouds more evenly. And hence the need for non-linear models.

The kernel trick is one of the coolest things I’ve learned in ML. We have bulit methods that are fundamentally based on drawing lines (Linear Regression, SVM or NN can ultimately only draw lines). However the genius of kernel trick is to transform the data such that the clouds simply go apart and a line can then be drawn.

Think of it like this. There is a river separating the land into two. We can think of lands separated by a crooked line, or if the land itself were to be stretched like a blanket, with the right kind and amount of stretching the river looks straight and the two lands on both sides warped.

That’s the simple trick. SVM and (for this particular topic) Neural Networks are basically about that. While SVM has an explicit bias about the type of kernel that must be used, Neural Networks take it a step further and finds its own kernel.

... continue reading ...