Unlocking the Secrets of Text Understanding

Hello there! On Saturday, September 9th, 2023, I was on the supercomputing stand for the Hull Science Festival with a cool demo that showcases how artificial intelligences understand and process text. Today, I’m excited to announce that the demo is now available online here on my website!

In this blog post, I’ll provide a quick explanation of what you’re looking at, but if you’re impatient, you can just find the demo here: .

All artificial intelligences (AIs) currently developed are essentially complex parametrised mathematical models. We train these models by updating their parameters little by little until the output of the model is similar to the output of some ground truth label. In other words, an AI is just a bunch of math!

So, how does it understand text? The answer lies in converting text to numbers – a process often called ‘word embedding’. This is done by splitting an input sentence into words, and then individually converting each word into a series of numbers, which is what you’ll see in the demo. Similar sorts of words will have similar sorts of numbers (or positions in 3D space in the demo).

In the demo, you’ll see clouds of words processed from Wikipedia. I downloaded a bunch of page abstracts for Wikipedia in multiple languages, extracted a list of words, converted them to numbers using GloVe and then plotted them in 3D space. Can you identify every language displayed here?

If you were one of the lucky people who saw my demo in person, you may notice that this online demo looks different from the one I originally presented at the science festival. That’s because the in-person demo uses data from social media, but this one uses data from Wikipedia to preserve privacy.

I hope you enjoy the demo! Time permitting, I’ll be back with more posts soon to explain how I did this and the AI/NLP theory behind it at a more technical level. Some topics I want to talk about include:

* How word embedding works

* The theory behind the GloVe algorithm

* How to create your own word embeddings using Python

Until next time, I’ll leave you with two pictures I took on the day: .

Edit 2023-11-30: Oops! I forgot to link to the source code….! If you’d like to take a gander at the source code behind the demo, you can find it here:

Please note that comments that do not follow these rules will be deleted, and may result in a ban for the offending user. These rules may also be amended without notice, so please check them often.

Leave a Reply