### Read Word Pairs Graph

In one of previous posts we built and saved*Word2Vec2Graph for pair of words*of Stress Data file. In this post we will look for topics of the Stress Data file.

Read stored vertices and edges and rebuilt the graph:

### Connected Components with Cosine Similarity Thresholds

In the previous post we showed that in the graph of all word pairs, almost all pairs are connected within the same large connected component. To find topics in that post we used Label Propagation algorithm. In this post we will look at connected components with thresholds on Word2Vec cosine similarity.As input parameters in this function we will use: graph vertices and edges, minimum and maximum of edge weight and minimum and maximum of connected component sizes.

As output of this function we are getting DataFrame with connected component identifiers, connected component sizes, and source, target and weight of edges. Example:

### Transform to .DOT Language

We described in our previous post how to build graph on Gephi. This function creates a list of direct edges with labels on .DOT format. This function takes an output of the w2v2gConnectedComponents function, component id and minimum and maximum of edge weight:

This example is based on the the same parameters that we used to get an example of w2v2gConnectedComponents function:

This topic does not look very interesting: is shows well known word connections. Now we will look at the results based on different thresholds.

### How to Find Topics?

Do select potentially interesting connected components first we'll look at three most highly connected word pairs of all components. We will start with connected components based on word pairs with cosine similarity >0.5.

We will select connected component '146028888067' and run 'component2dot' function with different thresholds.

Word pairs with cosine similarity >0.5:

Here is the graph:

Word pairs with cosine similarity between 0.1 and 0.4:

Here is the graph for these topics:

We can see in this graph that associations between word pairs with low cosine similarity give us more new ideas then word pairs with high cosine similarity. To find more new associations we will look at connected components based on word pairs with lower cosine similarity.

We will start with connected component '68719476739' that has word pairs with lowest cosine similarities. Graph DOT lines with the same thresholds:

Graph DOT lines with no thresholds:

Here a other topics that gives us more ideas about stress relieve. We run 'component2dot' with no cosine similarity thresholds: