## Assessing the effectiveness of our hetnet permutations

We've previously discussed what hetnet permutation is and why we do it. To permute a hetnet, we go through each relationship type (metaedge) and repeatedly swap the target nodes of two random relationships (edges). This strategy is called XSwap [1].

We looked into performing the permutation in neo4j using cypher, but decided to stick with our python implementation since cypher's cost planner currently lacks the needed abilities.

## Implementation specifics

We closely followed the parameters from our previous study [2] and did the following:

• We created 5 permuted hetnets. The first permutated hetnet was created from the unpermuted hetnet; the second permutated hetnet was created from the first permutated hetnet; and so on until the fifth permutated hetnet was created from the fourth permutated hetnet. This iterative strategy is referred to as a Markov chain.
• To create each permuted hetnet, we separately permuted each metaedge. For a given metaedge, we attempted n XSwaps where n equals four times the number of edges (multipler = 4).
• Xswaps can be unsuccessful for several reasons. The same edge could have been randomly selected twice (referred to as same_edge). One or both of the potential new edges may already exist (duplicate or undirected_duplicate for select cases where a biderectional edge connects two nodes of the same type). One or both of the potential new edges may connect a node to itself (self_loop). In these instances, no swap is performed. In the future, we may switch to stopping a completing permutation after a certain number of successes rather than attempts.

## Assessing permutation effectiveness

For each permutation and each metaedge, we measure the progress of the randomization at 10 points (dataset). The measure we're primarily interested in is the percent of edges that are unchanged after a permutation (unchanged).

We find that the percent of unchanged edges varies by metaedge (notebook cell 4). It appears that we could safely reduce our multiplier from 4 to 2.5 and still generate permuted networks that are maximally diversified from their predecessor.

Of concern are metaedges where a high percentage of the edges do not change. This occurred when a high percentage of swaps resulted in already existing edges (notebook cell 6). Particularly troublesome was the Anatomy–expresses–Gene edge where almost all attempts yielded duplicated edges and only ~10% of edges changed from a permutation. I'm now inclined to revisit our previous observation that we're being too permissive regarding expression edge inclusion.

Metaedges whose edges do not change from permutation are limited in informativeness. Such edges hold little information besides their degree contribution to the nodes they connect. In the context of our expression edge, the problem is visible in the node degree distribution: most anatomies express 0 genes while a minority of anatomies express an extremely high number of genes (see the anatomy - expresses - gene panel on page 10).

Daniel Himmelstein Researcher

# Improved randomization of expression edges

We updated our method for extracting Anatomy–expresses–Gene relationships from Bgee. This update reduced the number of expression edges in our hetnet from 1,006,278 to 526,407. The number of genes with an expression edge went from 18,147 to 18,094. The number of anatomies with an expression edge went from 256 to 241.

The permutation of expression edges increased in effectiveness. Now ~25% of expression edges (as opposed to ~10% previously) change in a permuted hetnet. And this is in spite of fewer attempted swaps per permutation: I decreased the multiplier from 4 to 3 to reduce runtime.

Daniel Himmelstein Researcher

# Comparison to BiRewire

BiRewire is an R Bioconductor package for bipartite network randomization (source code, homepage) [1, 2]. Like our implementation, BiRewire performs randomization through edge swaps (i.e. XSwap in [3]), which they refer to as the "switching-steps".

BiRewire also accounts for edge-sign, which is essentially a second dimension of direction, that can be used to represent a binary edge attribute, such as activation versus inhibition [2]. From my understanding, you could engineer sign-awareness by permuting sign-positive and sign-negative edges separately. In other words, sign-positive edges can only be swapped with other sign-positive edges. In the hetio randomization implementation, we perform edge swaps only within metaedge type. We also make sure to respect directionality when present.

BiRewire also adopts a Markov chain approach, where permutation is done iteratively: the first permutation is created from the unpermuted hetnet, and the second permutation is created from the first permutation. BiRewire describes the conceptual appeal of this approach [2]:

In [1] we prove that, if executing a sufficiently large number of [switching-steps], the [switching-algorithm] can efficiently simulate samplings from the uniform distribution of all the possible bipartite networks with predefined node sets and prescribed node degrees. Therefore it can be used to obtain a rewired version of a network B that it is, on average, no more similar to B than are similar to each other two bipartite networks B₁ and B₂ sampled from the real uniform distribution of all the possible bipartite networks with the same node sets and node degrees of B.

The BiRewire team did some nice analytical work relating to how many swaps need to be performed by each iteration of the Markov chain [1]:

We analytically derived an approximate lower bound for the number of switching-steps to be performed by the switching-algorithm

$$N=\frac{N′}{200(1-d)} \ln⁡{\frac{(1-d)N′}{100}}$$

where d is the edge density of the original network, defined as the ratio between |E| and the number of edges of a fully connected bipartite graph with the same number of nodes in the two classes.

We took an empirical rather than analytical approach to ensure that we performed sufficient edge swaps (see above comments). However, we can compare our number of swaps to the minimum suggested by the BiRewire team for the most difficult metaedge to randomize (Anatomy–expresses–Gene). Using our empirical randomization statistics, we apply the formula above in Python:

import math
N_prime = 526407
d = N_prime / (18094 * 241)
N = N_prime / (200 * (1 - d)) * math.log(0.01 * (1 - d) * N_prime)
n_swaps = 1579220 * (1 - 0.90)
assert N <= n_swaps
print(round(N), round(n_swaps))

The minimum N (number of edge swaps) suggested by the BiRewire formula is 25264, whereas we performed approximately 158,000. Given that I'm not sure all of the assumptions of BiRewire's formulation apply to our case (e.g. "same number of nodes in the two classes"), I still recommend empirically investigating the decay curve of percent of edges changed, rather than relying exclusively on the formula. But nice to see the complementary research into network randomization!

Status: Completed
Views
78
Topics
Referenced by
Cite this as
Daniel Himmelstein (2016) Assessing the effectiveness of our hetnet permutations. Thinklab. doi:10.15363/thinklab.d178