¿What can the book of Don Quixote have in common with Artificial Intelligence? How can an AI learn Don Quixote and create contextualized text in it? . We are going to explain these questions in this article and descend in a simple way as behind that "literary" artificial intelligence a mathematical reason is hidden

The objective of this article is to show in a visual and simple way how a model of Deep Learning can predict or create new text based on previous learning.

In this case we have used our own pre-trained model with the first 50 chapters of Don Quixote. This learning has given him enough knowledge to be able to create a contextualized text based on a sentence given by us.

Without going into much technical detail about the model, as it is not the goal of this article, it was created using #Python and the #Tensorflow library. The creation steps were

Where is the magic?

There really is no magic, the model has been able to look for and learn patterns in its training where it allows it to predict the next word based on a given one. But… How do you choose the next word?. Well, it chooses it by proximity, that is, the word that is closest to the one it has previously based on a weight matrix like this.

This weight matrix has in its rows (instances) all the unique words in the first 50 chapters of Don Quixotes, some 14.000 approximately. What are the columns?. Well, the 50 columns are the possible outputs of our neural network, since if you remember in the first paragraphs of this article we talked about our model of Deep Learning predicted a text of 50 words based on some given by us.

Now that we have this data we can convert those weights into distance, and it is at this point where we are going to represent it visually, using programming and mathematics.

We start

In order to make a representation in 2 dimensions, we must help ourselves with the principal component analysis (PCA) technique, because logically our vector of weights of the neural network is made up of 50 dimensions (50 columns), exactly the number of the layer of output, that is, the words to be created.

As it is impossible to graph in 50 dimensions, we must reduce the dimensionality trying to lose the least amount of information possible, that is why we will use PCA.

What is PCA?

It is a simple but popular and useful linear transformation technique that is used in numerous applications such as stock market predictions, gene expression data analysis, and many more. The goal is to reduce the dimensions of a d-dimensional data set by projecting it into a (k) -dimensional subspace (where k

  • Standardize the data.
  • Obtain the eigenvectors and eigenvalues ​​of the covariance matrix or correlation matrix, or perform singular vector decomposition.
  • Order the eigenvalues ​​in descending order and choose the k eigenvectors that correspond to the k largest eigenvalues ​​where k is the number of dimensions of the new entity subspace (k≤d) /.
  • Construct the projection matrix W from the k selected eigenvectors.
  • Transform the original data set X through W to obtain a k-dimensional entity subspace Y.

All this calculation can be done manually but for this article we have used the library of Scikit-learn to automate the process.

If you want to see how all this calculation can be done mathematically using Python, I have it programmed in the project notebook uploaded on github here!

We explain the code

What we do in the first three lines of our programming cell is create the PCA object, passing it as a condition the creation of 2 components, remember we must represent in 2 dimensions. Then we transform the weights into only two columns using PCA and finally we create a dataset with the results and marking the two main components created as columns (PCA_1 and PCA_2)

The next lines of code is to join that principal component dataset with the one in the word list, so that each word corresponds to its corresponding PCA_1 and PCA_2.

The variable weights I have used to train the PCA object are the weights of my model, which you can easily get from a pre-trained model using

e = model.layers [0] weights = e.get_weights () [0]

Once we have all the words of Don Quixote established in relation to two variables we can make a graph.

For this case we have used the library Seaborn in point format.

Can we know the distance between them?

Sure, now mathematically we can use for example the Euclidean Distance to know how far some words are compared to others. For example, look at this practical case made to know the distance of the words, with respect to the word "Quixote"

Column «0» shows us the distance of that word with respect to "Quixote"

It can be seen that if in this matrix we search for the word «Don Quixote» (we put it in lowercase to facilitate the training of the model) its Euclidean distance is 0

And in 3 Dimensions ...

Of course, we simply have to create our PCA and keep 3 Main Components, instead of two as in the previous case. This will be even more accurate because less information is lost by reducing one dimensionality less.

Here you have the example made in Python

And now we can visualize it better ...

Once we have already explained how you can actually visualize the data of a text prediction in a model of Deep Learning, we are not going to help the project TensorBoard de Tensorflow where it will allow us to make a very interactive visualization of our data, using exactly the techniques that we have just explained.

This is really the objective of this article, to explain in a visual way and without going into many specific details (although some have fallen) of how the prediction or creation of a text by an artificial intelligence is based on a mathematical reason and algorithm that we have visualized to understand in a simple way. Of course if you want to know more about this article or details you can access the complete project on my github


Here you have a video of how it turned out and also the link to the interactive website so that you can use it yourself.

Try it yourself: 3D visualization Quixote dimensions