Finding Free Associations
Free Associations is a psychoanalytic technique that was developed by Sigmund Freud and still used by some therapists today. Patients relate to whatever thoughts come to mind in order for the therapist to learn more about how the patient thinks and feels. As Freud described it: "The importance of free association is that the patients spoke for themselves, rather than repeating the ideas of the analyst; they work through their own material, rather than parroting another's suggestions"
In one of our previous posts - "Word2Vec2Graph - Psychoanalysis Topics" - we showed how to find free associations using Word2Vec2Graph technique. In this post we will show a different method - unsupervised Convolutional Neural Network classification. As a text file we will use data about Psychoanalysis taken from Wikipedia.
Word Pair Classification - Step by Step
We will convert word pairs to vectors, than convert vectors to images, than classify images via CNN classification method. To transform pairs of words to images will use method described in Ignacio Oguiza's notebook Time series - Olive oil country. Technique we use in this post is different than technique we used in our previous post:
- Read text file, tokenize, remove stop words
- Transform text file to pairs of words that stay in text next to each other
- Read trained Word2Vec model and map words to vectors
- Concatenate word vectors with themselves reversing the second vector: {word1, word1} pairs will generate symmetrical (mirror) sequences of numbers. Label these sequences as "Same".
- Concatenate word vectors of pairs {word1, word2} reversing the word2 vector. Label these sequences as "Different".
- Randomly select a subset of "Different" pairs.
- Convert vectors to images and run CNN classification model.
Unsupervised Image Classification
So we are concatenating pairs of vectors, transforming concatenated vectors to images and classifying images. This CNN image classification compares "Same" - mirror images with "Different" - non-mirror images. Images that are similar to mirror images represent pairs of similar words - common associations. Images that are very different than mirror images represent pair of words that are not expected as pairs, i.e. "free associations" psychoanalysis is looking for.
This technique allows us to do unsupervised CNN classification. Of course, this method is not limited to word pair classification. In particularly it can be applied to unsupervised outlier detection.
For example, we can take time series stock prices data, concatenate TS vectors with themselves (reversed) and get 'mirror' vectors/images. Then we can concatenate TS vectors with reversed market index vectors (like S&P 500) and convert them to images. CNN classifier will find {TS vector, S&P 500 vector} images that are very different than mirror images. These images will represent stock price outliers.Read and Clean Text File
Read text file, tokenize it and remove stop words:Get Pairs of Words
Get pairs of words from text than explode ngrams:
Vectors for Pairs of Words
Read trained Word2Vec model:
Map words of word pairs to Word2Vec model and get sets: {word1, vector1, word2, vector2}:
Get single words with vectors from word pairs: {word1, vector1}:
Combine Vectors of Word Pairs
Combine vectors from word pairs {word1, word2} reversing the second vector.Combine vectors from single words with themselves reversing the second vector.
CNN Classification
To convert vectors to images and classify images via CNN we used almost the same code that Ignacio Oguiza shared on fast.ai forum Time series - Olive oil country.
We splitted the source file to words={pairType, word} and vector. The 'pairType' column was used to define "Same" or "Different" category for images and 'word' column to define word pairs.
Tuning classification model we've got abound 96% accuracy. Here is a code to display results:
Examples: "Mirror" Word Pairs
Word pair - 'explanations~explanations':
Word pair - 'requirements~requirements':
Word pair - 'element~element':
Examples: Pairs of Similar Words
Word pair - 'thoughts~feelings':
Word pair - 'source~basic':
Word pair - 'eventually~conclusion':
Examples: Unexpected Free Associations
Word pair - 'personality~development':
Word pair - 'societal~restrictions':
Word pair - 'contingents~accompanying':
Word pair - 'neurotic~symptoms':
Word pair - 'later~explicitly':
Word pair - 'theory~published':
Next Post - Associations and Deep Learning
In the next post we will deeper look at deep learning for data associations.