Network permutation randomizes edges in a graph to remove signal. Metrics computed on permuted networks provide a baseline to evaluate the extent of signal contained in the network. Different types of permutations destroy different aspects of the information encoded by a network.
The method we've previously used selects two random edges and swaps the endpoints (labeled XSwap in ). This method preserves node degree while destroying edge specificity.
We adapted the edge swapping technique to hetnets by permuting each metaedge separately—edges are only swapped with other edges of the same type. We found the permutation yielded valuable insights on which aspects of the network were informative and the quality of our predictions .
We've subsequently migrated to neo4j, so are now looking to implement hetnet permutation in cypher. We tweeted this problem, and Michael Hunger started us on the right track.
We created a graphgist with potential implementations and cypher questions.
Partial Cypher solutions
We've made some headway implementing XSwap in Cypher. But first, here's the general algorithm we're aiming for:
Randomly select two relationships of the specified type
If valid, XSwap the two relationships
Repeat 1 and 2 until a certain number of swaps have succeeded
We initially attempted to implement the above steps in a single cypher query. However, as Michael Hunger explained, the eagerness behavior of cypher prevents looping 1 and 2. Instead of 1, 2, 1, 2, 1, 2, cypher will do 1, 1, 1, 2, 2, 2, which fails because our technique randomizes iteratively.
Therefore, we switched to python for managing steps 1 and 3, while keeping step 2 in cypher. This external method is now implemented in hetio. The function starts by retrieving the ids for all relationships of the specified type. Next, iteration begins:
Two relationships ids are randomly selected and sent as parameters to a cypher query.
If the swap is invalid, the cypher query returns no rows. Otherwise, the query returns the ids for the two created relationships and the python id list is updated.
There are two outstanding issues with this method:
First, retrieving ids for all relationships of a given type could become problematic for extremely abundant relationship types. Currently this is not a problem as we can retrieve over 1 million ids using py2neo without failure. If problematic, we could create an identity property to each relationship with an existence constraint for efficient lookup. Indexed property values are a practical must for this solution, yet are currently only available with Enterprise.
Second, the rule planner in 2.3.1 does not do an indexed lookup of relationship by id—instead it scans all relationships. The cost planner works as desired with a DirectedRelationshipByIdSeekPipe. However the 2.3.1 cost planner doesn't support write operations (3.0 is slated to add this functionality).
So in conclusion, we are close to a workable solution for permuting a neo4j hetnet. Depending on developments, we'll choose whether to adopt a solution discussed here or permute outside of neo4j with the legacy hetio functionality.