In this post we will look at different ways to find neighbors - via Word2Vec model and via Word2Vec2Graph model.
Two Connected Components with Page Rank
Here are the results from the previous post. We combined two large connected components with Page Rank.
Biggest component:
Second component:
Find Neighbors via GraphFrames "Find" Function
We will look at neighbors of the word with highest Page Rank for each of these connected components. To find word neighbors we will use 'find' GraphFrames function.
Word "hormones":
Word "processes":
Find Neighbors via Word2Vec Model
Another way to find word neighbors is similar to 'findSynonyms' in Word2Vec. Here is a function for a matrix based on words from text file (Stress Data File) with Word2Vec cosine similarities. We will have two parameters: cosine similarity threshold and number of similar words to find.
Word "processes" neighbors - we use the same threshold as we used to build a graph. We are getting the same results as using GraphFrames 'find' function:
Word "hormones" neighbors:
Finding Neighbors of Neighbors
Now let's say we need to find neighbors of neighbors, i.e. words with two degrees of separation. Doing this via functions similar to Word2Vec 'findSynonyms' function is not easy. But GraphFrames has elegant solutions to such problems via 'find' function.
Word "processes" neighbors of neighbors:
Word "hormones" neighbors of neighbors:
The word "hormones" has only one second degree neighbor and the word "processes" has several: some word combinations are appeared twice:
This shows that two triangles are attached to the word "processes".
Triangles
First we will look GraphFrames 'triangleCount' function
To see triangle word combinations will use 'find' function:
Next Post - Direct Graph
In the next post we will look at direct Word2Vec2Graph graphs.