In the previous post we explained
how to build Word2Vec2Graph model . The Word2Vec2Graph model was built based on a small Stress Data File - text file about stress that we extracted from Wikipedia. We used words from Stress Data File as vertices and Word2Vec model cosine similarity as edge weights. The Word2Vec model was trained on the corpus of combined News data and Wiki data. The Word2Vec2Graph model was built as a combination of Word2vec model and Spark GraphFrames library that has many interesting functions. In this post we will look at Page Rank function.
Get Data from Data Storage
Read word to word Stress Data File combinations with Word2Vec cosine similarities:
Read vertices and edges of Word2Vec2Graph and build a graph: </p>
Page Rank
Calculate Page Rank:
Our graph is built on the full matrix so all words pairs are connected therefore we are getting all Page Ranks equal to 1. Now we will look at Page Rank of a subgraph based on the edge weight threshold. We will use the same threshold (>.075) as we used in the previous post when we calculated graph connected components.
Build subgraph:
Calculate Page Rank:
Page Rank and Degrees
Graph that we use now is indirect so high Page Rank vertices are similar to high in-degree vertices:
In future posts we will look at direct Word2Vec2Graph and the results will be different.
Page Rank and Connected Components
Connected components:
Combine two connected components with Page Rank. Biggest component:
Second component:
Next Post - Word Neighbors
In the next post we will look at word neighbors for Word2Vec2Graph.