Research | Network Big-Data Analysis (NBDA) Research Group

Network Big-Data Analysis (NBDA) Research Group

Research Projects

The following are projects being conducted in this research group. Additional new and ongoing projects will be updated periodically.

The problem to be solved:

Preferential attachment has been actively analyzed in the field of complex networks, but itis not theonly factor on network evolution. It is easy to observe that popular nodes do not remain popular forever, because various node characteristics can affect network growth while weakening the effect of preferential attachment. Although there are some static corrections of preferential attachments via the popularity effects suggested in 2018, the popularity effects of individual nodes still do not explain well in dynamics how the effect of preferential attachment becomes weaken.

There has been considerable interest in generalized network structure. Suppose that more than two nodes are connected in a relationship, it is better represented by multi-node relations rather than cliques in binary relations for data complexity reduction. There exist two mathematical structures for multi-node relations: hypergraph and simplicial complex. The former is a structure created by extending an edge to a hyperedge that consists of two or more nodes, and the latter one is a more complicated object, which can be viewed as the set of hyperedges that includes all subsets of hyperedges. However, there are few literatures utilizing either tool for network analysis, and there are none, to our best knowledge, theoretical development on the evolution mechanisms and network modeling in this manner.

Innovative ideas proposed:

We develop a novel statistical model that explains the preferential attachment reduction phenomenon using the inhomogeneity of nodes. We investigate the node properties that affect such reduction, and figure out the size of the effect.

We present a framework applicable to various networks to enable mutual comparison of the effect of node properties on the reduction of preferential attachment.

We propose a generative weighted hypergraph model with the preferential attachment, and we aim to investigate how the degree distribution changes when we allow distributional variabilities on the number of and the size of the hyperedges being connected.

We derive the exact degree distribution of the proposed model and we show its asymptotic power-law shape on the degree distribution of hypergraph model.

Potential impact of the work:

The preferential attachment reduction phenomenon will provide a better fit on the degree distribution of a real network, which in returns provides a better estimate on both extreme of the node degree, and they usually stand for important few and general majority.

The use of hypergraph model is a new advancement to network data analysis. There are many advantages of describing a graph using hypergraph in mathematics, and these advantages also exist in when we describe a network (e.g. scientific collaboration) using hypergraph network model.

The problem to be solved:

Betti numbers are the best-known indices to characterize the topology of a manifold. They are intuitively interpreted as the numbers of independent holes of specified dimensions. Counting the Betti numbers has become a common practice in computational topology and been undertaken by numerous efficient algorithms. However, explicitly identifying the independent holes of an arbitrary dimension in a manifold remains largely unexplored. Furthermore, when a large network or manifold possesses heterogeneous topological characteristics such as their dimensions and Betti numbers, then quantifying the distributions of these topological characteristics and charting them on the manifold becomes critical in the research of computational topology.

Clustering a network into different sub-network is a traditional practice in network data analysis. No matter if the traditional modularity or the scan statistics is applied to a large-scale network, the computational complexity is always too high for a standard computer to afford. There is a need to efficiently partition a network into sub-network roughly, which can serve as a starting procedure before the use of more sophisticated state-of-the-art approaches.

Innovative ideas proposed:

We develop algorithms to solve three problems: (1) identifying the independent holes of multiple dimensions in a large manifold/network; (2) subdividing the manifold/network into region with distinct topological characteristics; and (3) quantifying the statistical significance of the topological indices such as Betti numbers in those regions.

We develop a fast-clustering method that recursively utilizes a binary splitting algorithm to partition a network into sub-networks. It will further be parallelized using similar techniques in swarm intelligence.

Potential impact of the work:

The use of Betti numbers to characterize network structure is new and useful for studying network structure, given the current characteristics consists of simple items like degrees, density, cliques.

The new fast-clustering method is very important for clustering large-scale networks because most of the existing clustering methods are infeasible to such high-complexity algorithms.

The problem to be solved:

Network visualization is important to users who gain insights from the network structure. Most textbook examples show a network in a two-dimensional fashion, but it is obvious on the limitation of two-dimensional space to illustrate complexity in network structure, and thus recent state-of-the-art methods like self-organizing maps or stochastic approaches suggest to draw a network in three-dimensional space instead. Unfortunately, these methods always suffer some problems like sticking nodes when cluster exists or crossover edges through the center, which hinders these three-dimensional visualization methods for greater and more informative uses.

On the contrary, there exists a research area, namely uniform designs in experimental designs, that can allocate points in a space uniformly. With appropriate transformation, it is possible to allocate nodes on a three-dimensional space uniformly. However, such method assumes all nodes to be independent of one another, which implies no edges or clusters are possible to exist. Thus, there is a need to develop a new three-dimensional visualization method to allocate nodes on a three-dimensional space uniformly with the consideration of edges and clusters.

Innovative ideas proposed:

We develop a new algorithm, namely The Uniform Placement of Alters on Spherical Surface (U-PASS), to illustrate a network in a three-dimensional space with the consideration of alter edges and alter clusters. It can allocate alter nodes more uniformly than traditional methods in terms of some spherical metrics.

We extend the U-PASS to be used in a general network without a pre-defined ego node. The main improvement is the introduction of the second concentric spherical surface to allocate the indirect alter nodes in a network.

We further the U-PASS to be used in a general weighted network. Instead of two separate concentric spherical surfaces, we allow the allocations of alter nodes in the three-dimensional space between two concentric spherical surfaces, and the distances of these alter nodes to the center are calculated from their weights on the edges.

We develop a new criterion called the generalized spherical cap discrepancy (gSCD) for quantifying the uniformity of alter nodes on a collapsed spherical surface.

Potential impact of the work:

The introduction of U-PASS provides a better visualization experience for users to get insights from the structure of the network.

The improvement from the standard U-PASS broadens the use of this technique to more general networks.

The problem to be solved:

Past researches have focused on the structural characterization of networks and their generative mechanisms, whereas little is known about their diversity. This particular aspect of network analysis is important due to the fact a network is a manifestation of how nodes interact with each other, and the diversity of a network may be related to the macroscopic behavior of a complex system. Node relevance is another property that is seldomly mentioned in the network analysis, but when a large-scale network is given, this quantity helps in reducing the computational complexity originated from the large number of nodes. Node similarity considers how similar two nodes in a network is, which not only in terms of node attributes but also in terms of its structural configuration. Two similar nodes in both node attributes and structural configurations may imply some connections or similarities on the nodes' entities.

There may be other new node properties that are never considered in the past. The key difficulty in this project is to provide a good and interpretable definition to these new properties, and hopefully they have some connections to the existing node properties like various centralities.

Innovative ideas proposed:

We develop measures for assessing how diverse a network is based on (1) the notation of regular or structural equivalence, measuring the similarity between node positions in a network, (2) the concept of positional centrality where diversity is defined in terms of how diverse species centrality values are in a network, and (3) the average similarity in nodes' interaction profiles.

We develop measures for quantifying how relevance a node is towards the network structure. These measures are based on the difference of characteristics of the whole network and the reduced network with the target node is removed. They may include the change in degree distribution, the change in network connection density, the change in the number of n-nodes cycles, etc.

We develop measures for determining how similar two nodes are in a given network. The similarity of two nodes includes both the node attributes and the neighboring node structural configurations.

Potential impact of the work:

All three new properties, together with many new unknowns, will provide measures and descriptions of a network from different angles when compared to traditional properties like centrality. All new properties have their own measures, algorithms and methods, statistical interpretations and connections to the traditional properties, so practitioners can view their networks from various different points of view and obtain new insights from different perspectives.