Anyone doing machine learning is familiar with the open source scikit-learn library which offers an intuitive way to develop a variety of machine learning models. Now, a similar library, scikit-network, is available for machine learning on graphs offering familiar API, efficient representation of graphs, and a collection of fast algorithms. Given that graph-based machine learning is getting lots of attention recently, I thought it is a good idea to introduce the scikit-network library to my readers.
The first thing is obviously to install the library. Having done that, let’s look at how the graphs are created and displayed in scikit-network (sk-network) library. We can create graphs in several ways:
- 1. By defining an adjacency matrix
- 2. Using an edge list
- 3. Loading an existing graph from
The scikit-network represents a graph by its adjacency matrix in the Compressed Sparse Row (CSR) format of SciPy. The graphs are drawn using SVG (scalable vector graphics). A simple example of creating a graph and its display are shown below.
# import all necessary libraries import sknetwork as skn import numpy as np import pandas as pd from IPython.display import SVG from scipy import sparse #Define an adjacency list to create a graph adjacency = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [0, 1, 0, 1]]) adjacency = sparse.csr_matrix(adjacency)
Let’s display the graph that we have created.
from sknetwork.visualization import svg_graph image = svg_graph(adjacency) SVG(image)
from sknetwork.data import karate_club, miserables, movie_actor graph = karate_club(metadata=True) adjacency = graph.adjacency position = graph.position labels = graph.labels image = svg_graph(adjacency, position, labels=labels) SVG(image)
The library has several built-in utility functions to gather basic properties of graphs. An example is shown below.
from sknetwork import utils as ut deg = ut.get_degrees(adjacency) print(deg)
[16 9 10 6 3 4 4 4 5 2 3 1 2 5 2 2 2 2 2 3 2 2 2 5 3 3 2 4 3 4 4 6 12 17]
Let’s look at functions related to graph topology. I will show the use of two functions, connected components and the clustering coefficient of a graph. Recall that a connected component of a graph is a subgraph of nodes wherein every node is reachable from another node in the subgraph.The clustering coefficient captures the degree to which the neighbors of a given node link to each other.
from sknetwork.topology import get_connected_components, get_clustering_coefficient get_connected_components(adjacency) np.round(get_clustering_coefficient(adjacency), 2)
Scikit-network offers several algorithms to perform machine learning on graphs. Each algorithm is available as an object with some methods similar to those found in the scikit-learn library. I show an examples below to illustrate the usage of the scikit-network library. The example is for performing clustering, also known as community detection. The example method shown here is known as the Louvain method for community detection. The graph used here is the Karate-club graph.
from sknetwork.clustering import Louvain louvain = Louvain(random_state=13) labels = louvain.fit_predict(adjacency)# Labels reflect community ids image = svg_graph(adjacency,labels=labels) SVG(image)
The library not only offers the traditional graph machine learning objects and methods but also includes deep learning modules for graphs including graph convolutional classifier and GraphSage. I am hoping you will explore this library for your use. The only shortcoming I found with the svikrit-network is its documentation. An improved documentation and a tutorial will go a longways to make it popular.