Network Embedding for Role Discovery: Concepts, Tools, and Applications

Official website for the SDM 2022 tutorial

Description

Role discovery in networks—i.e., finding nodes that have similar functions, ties or interactions with nodes in other positions, irrespective of their proximity or reachability in the network—has received decades of study in quantitative sociology and has subsequently become a fundamental problem in graph data mining. Recently, network embedding methods have been developed to model the structural roles of nodes in latent feature space, leveraging the power of representation learning for role discovery. In this tutorial, we will review the formalisms of structural roles in networks using techniques from network science and social science, and cover many existing node embedding methods to analyze the extent to which their technical objectives make them capable of discerning these structural roles. We will present common pitfalls in the evaluation of structural embedding methods, and comprehensive evaluation methodology to overcome those and characterize the strengths and weaknesses of different embedding methods both in synthetic, controlled settings and real-world applications. The applications will span a single network and across multiple networks, including a large-scale industrial case study. Finally, we will introduce an extendable, public network embedding library we have developed to allow for easy use of many structural embedding methods, giving attendees a hands-on opportunity to try out what they have learned.

Resources

  • Slides for the tutorial are available in PowerPoint and PDF formats (the latter without some of the animations).
  • Code for an accompanying Python library containing many implementations of role-based network embedding methods and synthetic and real-world benchmarks on which they may be evaluated.
  • Paper (pulished in TKDD 2021) where we present the results of an empirical study facilitated by our codebase, upon which our tutorial draws.

About the Tutorial Authors

Mark Heimann is a postdoctoral researcher at Lawrence Livermore National Laboratory in the Machine Intelligence Group. Before that, he completed his PhD in computer science at the University of Michigan, Ann Arbor. His research in graph data mining focuses on representation learning, its use in large-scale problems involving multiple networks, and its applications to graph-based formulations for many relevant scientific applications such as materials discovery and medical image analysis. His work has been published in several top-tier data mining and machine learning conferences and has received a Best Student Paper award at ICDM.

Junchen Jin is an MS student in Analytics at Northwestern University whose professional experience spans both academic research and industry. Previously he was an undergraduate and post-baccalaureate researcher in the Graph Exploration and Mining at Scale lab at the University of Michigan, led by Danai Koutra. His research has focused on evaluation methodology for structural node embeddings (published in a top-tier data mining journal, ACM TKDD), as well as the design of graph neural network models that are robust to adversarial attacks. He also led the development of the structural embedding graph library that will be presented in this tutorial.

Danai Koutra is an Associate Director of the Michigan Institute for Data Science (MIDAS) and a Morris Wellman Associate Professor in Computer Science and Engineering at the University of Michigan, where she leads the Graph Exploration and Mining at Scale (GEMS) Lab. Her research focuses on practical and scalable methods for large-scale real networks, and her interests include graph summarization, graph representation learning, knowledge graph mining, similarity and alignment, and anomaly detection. She has won an NSF CAREER award, an ARO Young Investigator award, the 2020 SIGKDD Rising Star Award, research faculty awards from Google, Amazon, Facebook and Adobe, a Precision Health Investigator award, the 2016 ACM SIGKDD Dissertation award, and an honorable mention for the SCS Doctoral Dissertation Award (CMU). She holds one "rate-1" patent on bipartite graph alignment, and has multiple papers in top data mining conferences, including 8 award-winning papers. She is the Secretary of the new SIAG on Data Science, an Associate Editor of ACM TKDD, a track co-chair for the "Social Network Analysis and Graph Algorithms" track at TheWebConf 2022, and has served multiple times in the organizing committees of all the major data mining conferences (e.g., ACM SIGKDD, ACM WSDM, SIAM SDM, ECML/PKDD, ACM CIKM, IEEE ICDM). She has also co-organized 7 tutorials (2 of them at SDM) and 6 workshops.