NetworkX Graph Analysis

Overview

NetworkX is a Python library for creating, manipulating, and analyzing complex networks and graphs. It provides data structures for undirected, directed, and multi-edge graphs along with a comprehensive collection of graph algorithms, generators, and I/O utilities. Use NetworkX when working with relationship data in social networks, biological interaction networks, transportation systems, citation graphs, or any domain involving pairwise entity relationships.

When to Use

Analyzing protein-protein interaction networks, gene regulatory networks, or metabolic pathways
Computing centrality measures (degree, betweenness, PageRank) to identify important nodes
Finding shortest paths or optimal routes in transportation or communication networks
Detecting communities or clusters in social networks or co-expression data
Generating synthetic networks (scale-free, small-world, random) for simulation or null models
Reading and writing graph data in standard formats (GraphML, GML, edge lists, JSON)
Visualizing network topology with node/edge attribute mapping
Checking graph properties: connectivity, planarity, isomorphism, DAG structure
For large-scale graphs (100K+ nodes) where speed is critical, use igraph or graph-tool instead
For billion-edge graphs or GPU-accelerated analytics, use graph-tool with OpenMP or cuGraph
For graph neural networks and deep learning on graphs, use torch-geometric-graph-neural-networks

Prerequisites

Python packages: networkx, matplotlib, scipy, pandas, numpy
Optional: pydot or pygraphviz (Graphviz layouts)

pip install networkx matplotlib scipy pandas numpy

Quick Start

import networkx as nx

# Create a graph and add edges with weights
G = nx.karate_club_graph()
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 34, Edges: 78

# Compute centrality and find most central node
bc = nx.betweenness_centrality(G)
top_node = max(bc, key=bc.get)
print(f"Most central node: {top_node}, betweenness: {bc[top_node]:.3f}")

# Detect communities
from networkx.algorithms import community
comms = community.greedy_modularity_communities(G)
print(f"Communities found: {len(comms)}")

Core API

Module 1: Graph Creation and Types

import networkx as nx

# Undirected graph (most common)
G = nx.Graph()
G.add_node("protein_A", type="kinase", weight=1.5)
G.add_nodes_from(["protein_B", "protein_C"])
G.add_edge("protein_A", "protein_B", weight=0.9, interaction="phosphorylation")
G.add_edges_from([("protein_B", "protein_C"), ("protein_A", "protein_C")])
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 3, Edges: 3

# Directed graph (gene regulation, citations)
D = nx.DiGraph()
D.add_edges_from([("TF1", "geneA"), ("TF1", "geneB"), ("TF2", "geneA")])
print(f"TF1 out-degree: {D.out_degree('TF1')}")  # 2

# MultiGraph (multiple relationship types between same nodes)
M = nx.MultiGraph()
M.add_edge("A", "B", key="binding", affinity=0.8)
M.add_edge("A", "B", key="regulation", effect="inhibition")
print(f"Edges between A-B: {M.number_of_edges('A', 'B')}")  # 2

Module 2: Node and Edge Operations

import networkx as nx
G = nx.karate_club_graph()

# Query structure
print(f"Degree of node 0: {G.degree(0)}")
print(f"Neighbors of node 0: {list(G.neighbors(0))[:5]}")
print(f"Has edge 0-1: {G.has_edge(0, 1)}")

# Set and get attributes
G.nodes[0]["role"] = "instructor"
nx.set_node_attributes(G, {0: "high", 33: "high"}, "importance")
G[0][1]["weight"] = 0.95

# Iterate with data
for u, v, data in G.edges(data=True):
    if "weight" in data:
        print(f"  Edge {u}-{v}: weight={data['weight']}")
        break

# Subgraphs (returns read-only view; use .copy() for mutable)
H = G.subgraph([0, 1, 2, 3, 4, 5]).copy()
print(f"Subgraph: {H.number_of_nodes()} nodes, {H.number_of_edges()} edges")

Module 3: Graph Analysis (Centrality)

import networkx as nx
G = nx.karate_club_graph()

degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G, weight="weight")
# For large graphs, approximate: nx.betweenness_centrality(G, k=100)
close_c = nx.closeness_centrality(G)
eigen_c = nx.eigenvector_centrality(G, max_iter=1000)
pr = nx.pagerank(G, alpha=0.85)

# Compare top nodes across measures
for name, metric in [("Degree", degree_c), ("Betweenness", between_c),
                     ("Closeness", close_c), ("PageRank", pr)]:
    top = max(metric, key=metric.get)
    print(f"{name:12s}: top node={top}, score={metric[top]:.4f}")

Module 4: Path and Connectivity

import networkx as nx
G = nx.karate_club_graph()

# Shortest path
path = nx.shortest_path(G, source=0, target=33)
length = nx.shortest_path_length(G, source=0, target=33)
print(f"Shortest path 0->33: {path} (length {length})")
print(f"Average shortest path length: {nx.average_shortest_path_length(G):.3f}")

# Connected components
print(f"Connected: {nx.is_connected(G)}")
components = list(nx.connected_components(G))
print(f"Components: {len(components)}, largest: {len(max(components, key=len))}")

# For directed graphs: strong/weak connectivity
D = nx.DiGraph([(0,1),(1,2),(2,0),(3,4)])
print(f"Strongly connected: {list(nx.strongly_connected_components(D))}")

# Connectivity measures
print(f"Node connectivity: {nx.node_connectivity(G)}")
print(f"Edge connectivity: {nx.edge_connectivity(G)}")

Module 5: Community Detection

Partition networks into densely connected groups.

import networkx as nx
from networkx.algorithms import community
import itertools

G = nx.karate_club_graph()

# Greedy modularity maximization
comms_greedy = community.greedy_modularity_communities(G)
mod_score = community.modularity(G, comms_greedy)
print(f"Greedy: {len(comms_greedy)} communities, modularity={mod_score:.4f}")

# Label propagation (fast, non-deterministic)
comms_lpa = community.label_propagation_communities(G)
print(f"Label propagation: {len(list(comms_lpa))} communities")

# Girvan-Newman (hierarchical, edge betweenness removal)
gn = community.girvan_newman(G)
# Get first level of partition
first_level = next(gn)
print(f"Girvan-Newman first split: {len(first_level)} groups")
print(f"  Sizes: {[len(c) for c in first_level]}")

Module 6: I/O and Serialization

import networkx as nx
import pandas as pd
import json

G = nx.karate_club_graph()

# Edge list (simple text format)
nx.write_edgelist(G, "karate.edgelist")
G_loaded = nx.read_edgelist("karate.edgelist", nodetype=int)

# GraphML (preserves all attributes, XML-based)
nx.write_graphml(G, "karate.graphml")
G_xml = nx.read_graphml("karate.graphml")

# JSON (node-link format, web-friendly for d3.js)
data = nx.node_link_data(G)
with open("karate.json", "w") as f:
    json.dump(data, f)

# Pandas integration
df = pd.DataFrame({"source": [1,2,3], "target": [2,3,4], "weight": [0.5,1.0,0.75]})
G_pd = nx.from_pandas_edgelist(df, "source", "target", edge_attr="weight")
df_out = nx.to_pandas_edgelist(G_pd)
print(f"Pandas round-trip: {len(df_out)} edges")

# NumPy/SciPy matrices
A = nx.to_numpy_array(G)
print(f"Adjacency matrix shape: {A.shape}")
A_sparse = nx.to_scipy_sparse_array(G, format="csr")  # Memory-efficient

Module 7: Visualization

import networkx as nx
import matplotlib.pyplot as plt

G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)

# Color by degree, size by betweenness centrality
bc = nx.betweenness_centrality(G)
fig, ax = plt.subplots(figsize=(10, 8))
nx.draw(G, pos=pos, ax=ax,
        node_color=[G.degree(n) for n in G.nodes()], cmap=plt.cm.viridis,
        node_size=[3000 * bc[n] + 100 for n in G.nodes()],
        edge_color="gray", alpha=0.8, with_labels=True, font_size=8)
plt.tight_layout()
plt.savefig("network.png", dpi=300, bbox_inches="tight")
plt.savefig("network.pdf", bbox_inches="tight")  # Vector format
print("Saved network.png and network.pdf")

Module 8: Generators

import networkx as nx

# Erdos-Renyi random graph: n nodes, edge probability p
G_er = nx.erdos_renyi_graph(n=200, p=0.05, seed=42)
print(f"ER: {G_er.number_of_nodes()} nodes, {G_er.number_of_edges()} edges")

# Barabasi-Albert scale-free (power-law degree distribution)
G_ba = nx.barabasi_albert_graph(n=200, m=3, seed=42)

# Watts-Strogatz small-world
G_ws = nx.watts_strogatz_graph(n=200, k=6, p=0.1, seed=42)
print(f"WS clustering: {nx.average_clustering(G_ws):.3f}")

# Stochastic block model (community structure)
sizes, probs = [50, 50, 50], [[0.25,0.05,0.02],[0.05,0.35,0.07],[0.02,0.07,0.40]]
G_sbm = nx.stochastic_block_model(sizes, probs, seed=42)

# Built-in datasets and classic graphs
G_karate = nx.karate_club_graph()       # Zachary's karate club
G_grid = nx.grid_2d_graph(5, 7)         # 2D lattice
G_tree = nx.random_tree(n=50, seed=42)  # Random tree
G_geo = nx.random_geometric_graph(n=100, radius=0.2, seed=42)
# See references/algorithms_generators.md for full generator catalog

Key Concepts

Graph Types

| Class | Directed | Multi-edge | Self-loops | Use Case | |-------|----------|------------|------------|----------| | Graph | No | No | Yes | Undirected networks: social, PPI | | DiGraph | Yes | No | Yes | Gene regulation, citations, web | | MultiGraph | No | Yes | Yes | Multiple relationship types | | MultiDiGraph | Yes | Yes | Yes | Transportation with routes |

Attribute Patterns

Attributes are stored as dictionaries at graph, node, and edge levels:

import networkx as nx
G = nx.Graph(name="example")              # Graph-level attribute
G.add_node(1, label="hub", weight=1.5)    # Node attributes
G.add_edge(1, 2, weight=0.8, type="ppi")  # Edge attributes

# Bulk set/get
nx.set_node_attributes(G, {1: "red", 2: "blue"}, "color")
colors = nx.get_node_attributes(G, "color")  # {1: 'red', 2: 'blue'}

Layout Algorithms

| Layout | Function | Best For | |--------|----------|----------| | Spring (force-directed) | spring_layout(G, seed=42) | General networks | | Circular | circular_layout(G) | Regular graphs, cycles | | Kamada-Kawai | kamada_kawai_layout(G) | Small-medium networks | | Spectral | spectral_layout(G) | Highlighting clusters | | Shell (concentric) | shell_layout(G, nlist=[[...],[...]]) | Layered/hierarchical | | Planar | planar_layout(G) | Planar graphs only |

Common Workflows

Workflow 1: Social Network Analysis

Goal: Identify influential actors, detect communities, and visualize.

import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community

# Step 1: Load network and basic stats
G = nx.karate_club_graph()
print(f"Network: {G.number_of_nodes()} actors, {G.number_of_edges()} ties")
print(f"Density: {nx.density(G):.4f}, Clustering: {nx.average_clustering(G):.4f}")

# Step 2: Identify influential nodes
bc = nx.betweenness_centrality(G)
top_bc = sorted(bc.items(), key=lambda x: x[1], reverse=True)[:5]
print("Top 5 by betweenness:", [(n, f"{s:.3f}") for n, s in top_bc])

# Step 3: Detect communities
comms = community.greedy_modularity_communities(G)
print(f"Communities: {len(comms)}, modularity: {community.modularity(G, comms):.4f}")

# Step 4: Visualize with community coloring
pos = nx.spring_layout(G, seed=42)
fig, ax = plt.subplots(figsize=(10, 8))
for i, comm in enumerate(comms):
    nx.draw_networkx_nodes(G, pos, nodelist=list(comm), ax=ax,
                           node_color=[plt.cm.Set2(i)]*len(comm), node_size=400)
nx.draw_networkx_edges(G, pos, ax=ax, alpha=0.3)
nx.draw_networkx_labels(G, pos, ax=ax, font_size=8)
plt.axis("off")
plt.tight_layout()
plt.savefig("social_network_analysis.png", dpi=300, bbox_inches="tight")
print("Saved social_network_analysis.png")

Workflow 2: Biological Interaction Network

Goal: Build a PPI network from tabular data, analyze topology, and identify hub proteins.

import networkx as nx
import pandas as pd

# Step 1: Load interaction data from DataFrame
interactions = pd.DataFrame({
    "protein_a": ["TP53","TP53","BRCA1","BRCA1","MDM2","ATM","ATM","CHEK2","RB1","CDK2"],
    "protein_b": ["MDM2","BRCA1","ATM","CHEK2","RB1","CHEK2","BRCA2","CDC25A","CDK2","CCNA2"],
    "score": [0.99, 0.95, 0.92, 0.88, 0.91, 0.97, 0.85, 0.90, 0.87, 0.93]
})
G = nx.from_pandas_edgelist(interactions, "protein_a", "protein_b",
                             edge_attr="score")
print(f"PPI network: {G.number_of_nodes()} proteins, {G.number_of_edges()} interactions")

# Step 2: Network statistics
print(f"Connected: {nx.is_connected(G)}")
print(f"Diameter: {nx.diameter(G)}")
print(f"Avg path length: {nx.average_shortest_path_length(G):.2f}")
print(f"Transitivity: {nx.transitivity(G):.4f}")

# Step 3: Hub identification (multiple centrality measures)
degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G)
close_c = nx.closeness_centrality(G)

results = pd.DataFrame({
    "protein": list(G.nodes()),
    "degree_centrality": [degree_c[n] for n in G.nodes()],
    "betweenness": [between_c[n] for n in G.nodes()],
    "closeness": [close_c[n] for n in G.nodes()],
}).sort_values("betweenness", ascending=False)
print("\nHub proteins:")
print(results.head(5).to_string(index=False))

# Step 4: Export for downstream analysis
nx.write_graphml(G, "ppi_network.graphml")
results.to_csv("protein_centrality.csv", index=False)
print("Exported ppi_network.graphml and protein_centrality.csv")

Key Parameters

| Parameter | Module | Default | Range / Options | Effect | |-----------|--------|---------|-----------------|--------| | weight | Paths/Centrality | None | Edge attribute name | Use weighted edges for path/centrality calculations | | alpha | pagerank | 0.85 | 0.0-1.0 | Damping factor; lower = more uniform distribution | | k | betweenness_centrality | None | int | Sample k nodes for approximation on large graphs | | max_iter | eigenvector_centrality | 100 | int | Max iterations for convergence | | seed | Generators/Layouts | None | int | Random seed for reproducibility | | n / p / m | ER/BA generators | varies | int/float | Node count, edge probability, edges per new node | | k / p | Watts-Strogatz | varies | int/float | Nearest neighbors, rewiring probability | | nodetype | read_edgelist | str | int, float, str | Type conversion for node identifiers | | edge_attr | from_pandas_edgelist | None | Column name(s) | Edge attribute columns to include from DataFrame | | format | to_scipy_sparse_array | "csc" | "csr", "csc", "coo" | Sparse matrix format |

Best Practices

Always set random seeds for reproducible generators and layouts: seed=42 in both erdos_renyi_graph() and spring_layout().
Use approximate algorithms for large graphs: nx.betweenness_centrality(G, k=500) samples k nodes instead of all pairs.
Prefer from_pandas_edgelist over manual add_edge loops for bulk data loading -- handles attributes cleanly and is faster.
Copy subgraphs before modification: G.subgraph(nodes) returns a read-only view; call .copy() for a mutable independent graph.
Use GraphML or GML for persistent storage to preserve all node/edge attributes. Edge lists lose metadata unless explicitly handled.
Convert graph types explicitly: D.to_undirected() (DiGraph -> Graph), nx.Graph(M) (MultiGraph -> Graph, collapses multi-edges).
Use sparse matrices for large adjacency exports: to_scipy_sparse_array() is far more memory-efficient than to_numpy_array().
Anti-pattern -- Don't use nx.info(): Deprecated; use G.number_of_nodes(), G.number_of_edges(), nx.density(G) directly.
Anti-pattern -- Don't assume node ordering: Algorithms may return results in different orders. Always index by node key, not position.

Common Recipes

Recipe: Minimum Spanning Tree

Extract the minimum spanning tree and compare to the original graph.

import networkx as nx

# Create weighted graph
G = nx.erdos_renyi_graph(50, 0.15, seed=42)
for u, v in G.edges():
    G[u][v]["weight"] = round(nx.utils.py_random_state(42).random(), 2)

mst = nx.minimum_spanning_tree(G, weight="weight")
print(f"Original: {G.number_of_edges()} edges")
print(f"MST: {mst.number_of_edges()} edges")
total_weight = sum(d["weight"] for _, _, d in mst.edges(data=True))
print(f"MST total weight: {total_weight:.2f}")

Recipe: Graph Coloring and Cliques

Find cliques and compute graph coloring.

import networkx as nx

G = nx.karate_club_graph()

# Find all maximal cliques
cliques = list(nx.find_cliques(G))
print(f"Maximal cliques: {len(cliques)}")
largest_clique = max(cliques, key=len)
print(f"Largest clique size: {len(largest_clique)}, nodes: {largest_clique}")

# Greedy graph coloring
coloring = nx.greedy_color(G, strategy="largest_first")
n_colors = max(coloring.values()) + 1
print(f"Chromatic number (greedy upper bound): {n_colors}")

Recipe: DAG and Topological Sort

Build a directed acyclic graph and find execution order.

import networkx as nx

# Task dependency DAG
D = nx.DiGraph()
D.add_edges_from([
    ("download_data", "preprocess"),
    ("download_data", "validate"),
    ("preprocess", "analyze"),
    ("validate", "analyze"),
    ("analyze", "visualize"),
    ("analyze", "report"),
    ("visualize", "report"),
])

print(f"Is DAG: {nx.is_directed_acyclic_graph(D)}")
order = list(nx.topological_sort(D))
print(f"Execution order: {order}")

# Find all paths from start to end
paths = list(nx.all_simple_paths(D, "download_data", "report"))
print(f"Paths to report: {len(paths)}")
for p in paths:
    print(f"  {' -> '.join(p)}")

Troubleshooting

| Problem | Cause | Solution | |---------|-------|----------| | NetworkXError: Graph is not connected | Algorithm requires connected graph | Extract largest component: G.subgraph(max(nx.connected_components(G), key=len)).copy() | | PowerIterationFailedConvergence | Eigenvector/PageRank did not converge | Increase max_iter (e.g., 1000) or check for disconnected components | | Very slow centrality computation | O(n*m) complexity on large graphs | Use k parameter for sampling: betweenness_centrality(G, k=500) | | nx.NetworkXNotImplemented | Algorithm not available for graph type | Convert graph type: G.to_undirected() or G.to_directed() | | Memory error on large graphs | Dense adjacency matrix | Use to_scipy_sparse_array() instead of to_numpy_array() | | Node IDs read as strings from file | read_edgelist defaults to str | Pass nodetype=int: nx.read_edgelist(f, nodetype=int) | | Community detection returns frozen sets | Normal return type for communities | Convert: [list(c) for c in communities] | | Self-loops in generated graphs | Configuration model allows self-loops | Remove: G.remove_edges_from(nx.selfloop_edges(G)) | | Visualization too cluttered | Too many nodes/edges | Filter to subgraph, adjust alpha, increase figure size, or use interactive tools (Plotly, PyVis) |

Bundled Resources

Migrated from original entry (STUB: 436-line main file + 2,014 lines across 5 reference files, main/total = 17.8%).

references/algorithms_generators.md

Covers: Detailed algorithm parameters for traversal (DFS/BFS), cycles, cliques, graph coloring, isomorphism, matching/covering, tree algorithms (MST variants). Full generator catalog: classic graphs, lattice/grid, tree, bipartite, degree sequence, graph operations (union, compose, complement, products). Relocated inline: Core algorithms (centrality, paths, connectivity, community, flow) -> Core API Modules 3-5. Core generators (ER, BA, WS, SBM) -> Module 8. Omitted: A* heuristic customization, Bellman-Ford negative weights -- consult official docs.

Original file disposition:

algorithms.md (383 lines): Top algorithms relocated to Core API Modules 3-5 + Recipes. Remaining (traversal, cliques, coloring, isomorphism, matching, cycles, trees) -> this reference.
generators.md (378 lines): Core generators relocated to Module 8. Full catalog (classic, lattice, tree, bipartite, degree sequence, operators) -> this reference.

references/io_visualization.md

Covers: All I/O formats (adjacency list, GEXF, Pajek, LEDA, Cytoscape JSON, DOT/Graphviz, Matrix Market, CSV, database/SQL, compressed gzip). Format selection guide. Advanced visualization: Plotly interactive, PyVis HTML, Graphviz layouts, 3D networks, bipartite layout, community coloring, subgraph highlighting, multi-panel figures, edge labels, directed arrows. Relocated inline: Core I/O (edge list, GraphML, JSON, pandas, NumPy/SciPy) -> Module 6. Basic matplotlib -> Module 7. Omitted: write_gpickle/read_gpickle (deprecated), read_shp/write_shp (removed in NetworkX 3.0; use geopandas).

Original file disposition:

io.md (441 lines): Core formats relocated to Module 6. Remaining formats + format selection guide -> this reference.
visualization.md (529 lines): Basic matplotlib relocated to Module 7. Advanced techniques (Plotly, PyVis, 3D, bipartite, community coloring) -> this reference.

Fully consolidated original file

graph-basics.md (283 lines): Fully consolidated into main SKILL.md. Graph types -> Key Concepts. Node/edge operations, attributes, subgraphs -> Core API Modules 1-2. Diagnostics -> Common Workflows. Memory/float-point considerations -> Best Practices + Troubleshooting. Omitted: nx.info() (deprecated).

Related Skills

torch-geometric-graph-neural-networks -- graph neural networks (GCN, GAT, GraphSAGE) for node/graph classification and link prediction on graph-structured data
matplotlib-scientific-plotting -- advanced figure customization beyond NetworkX's built-in nx.draw
plotly-interactive-plots -- interactive network plots with hover, zoom, and pan
pandas (planned) -- DataFrame operations for preparing edge/node data before graph construction
scipy (planned) -- sparse matrix operations and numerical algorithms used by NetworkX internally

References

NetworkX documentation -- official docs and API reference
NetworkX tutorial -- official getting started guide
NetworkX GitHub -- source code and issue tracker
NetworkX gallery -- example gallery with visualizations
Hagberg, A., Schult, D., & Swart, P. (2008). Exploring network structure, dynamics, and function using NetworkX. SciPy 2008.

NetworkX Graph Analysis

Overview

When to Use

Analyzing protein-protein interaction networks, gene regulatory networks, or metabolic pathways
Computing centrality measures (degree, betweenness, PageRank) to identify important nodes
Finding shortest paths or optimal routes in transportation or communication networks
Detecting communities or clusters in social networks or co-expression data
Generating synthetic networks (scale-free, small-world, random) for simulation or null models
Reading and writing graph data in standard formats (GraphML, GML, edge lists, JSON)
Visualizing network topology with node/edge attribute mapping
Checking graph properties: connectivity, planarity, isomorphism, DAG structure
For large-scale graphs (100K+ nodes) where speed is critical, use igraph or graph-tool instead
For billion-edge graphs or GPU-accelerated analytics, use graph-tool with OpenMP or cuGraph
For graph neural networks and deep learning on graphs, use torch-geometric-graph-neural-networks

Prerequisites

Python packages: networkx, matplotlib, scipy, pandas, numpy
Optional: pydot or pygraphviz (Graphviz layouts)

pip install networkx matplotlib scipy pandas numpy

Quick Start

import networkx as nx

# Create a graph and add edges with weights
G = nx.karate_club_graph()
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 34, Edges: 78

# Compute centrality and find most central node
bc = nx.betweenness_centrality(G)
top_node = max(bc, key=bc.get)
print(f"Most central node: {top_node}, betweenness: {bc[top_node]:.3f}")

# Detect communities
from networkx.algorithms import community
comms = community.greedy_modularity_communities(G)
print(f"Communities found: {len(comms)}")

Core API

Module 1: Graph Creation and Types

import networkx as nx

# Undirected graph (most common)
G = nx.Graph()
G.add_node("protein_A", type="kinase", weight=1.5)
G.add_nodes_from(["protein_B", "protein_C"])
G.add_edge("protein_A", "protein_B", weight=0.9, interaction="phosphorylation")
G.add_edges_from([("protein_B", "protein_C"), ("protein_A", "protein_C")])
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 3, Edges: 3

# Directed graph (gene regulation, citations)
D = nx.DiGraph()
D.add_edges_from([("TF1", "geneA"), ("TF1", "geneB"), ("TF2", "geneA")])
print(f"TF1 out-degree: {D.out_degree('TF1')}")  # 2

# MultiGraph (multiple relationship types between same nodes)
M = nx.MultiGraph()
M.add_edge("A", "B", key="binding", affinity=0.8)
M.add_edge("A", "B", key="regulation", effect="inhibition")
print(f"Edges between A-B: {M.number_of_edges('A', 'B')}")  # 2

Module 2: Node and Edge Operations

import networkx as nx
G = nx.karate_club_graph()

# Query structure
print(f"Degree of node 0: {G.degree(0)}")
print(f"Neighbors of node 0: {list(G.neighbors(0))[:5]}")
print(f"Has edge 0-1: {G.has_edge(0, 1)}")

# Set and get attributes
G.nodes[0]["role"] = "instructor"
nx.set_node_attributes(G, {0: "high", 33: "high"}, "importance")
G[0][1]["weight"] = 0.95

# Iterate with data
for u, v, data in G.edges(data=True):
    if "weight" in data:
        print(f"  Edge {u}-{v}: weight={data['weight']}")
        break

# Subgraphs (returns read-only view; use .copy() for mutable)
H = G.subgraph([0, 1, 2, 3, 4, 5]).copy()
print(f"Subgraph: {H.number_of_nodes()} nodes, {H.number_of_edges()} edges")

Module 3: Graph Analysis (Centrality)

import networkx as nx
G = nx.karate_club_graph()

degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G, weight="weight")
# For large graphs, approximate: nx.betweenness_centrality(G, k=100)
close_c = nx.closeness_centrality(G)
eigen_c = nx.eigenvector_centrality(G, max_iter=1000)
pr = nx.pagerank(G, alpha=0.85)

# Compare top nodes across measures
for name, metric in [("Degree", degree_c), ("Betweenness", between_c),
                     ("Closeness", close_c), ("PageRank", pr)]:
    top = max(metric, key=metric.get)
    print(f"{name:12s}: top node={top}, score={metric[top]:.4f}")

Module 4: Path and Connectivity

import networkx as nx
G = nx.karate_club_graph()

# Shortest path
path = nx.shortest_path(G, source=0, target=33)
length = nx.shortest_path_length(G, source=0, target=33)
print(f"Shortest path 0->33: {path} (length {length})")
print(f"Average shortest path length: {nx.average_shortest_path_length(G):.3f}")

# Connected components
print(f"Connected: {nx.is_connected(G)}")
components = list(nx.connected_components(G))
print(f"Components: {len(components)}, largest: {len(max(components, key=len))}")

# For directed graphs: strong/weak connectivity
D = nx.DiGraph([(0,1),(1,2),(2,0),(3,4)])
print(f"Strongly connected: {list(nx.strongly_connected_components(D))}")

# Connectivity measures
print(f"Node connectivity: {nx.node_connectivity(G)}")
print(f"Edge connectivity: {nx.edge_connectivity(G)}")

Module 5: Community Detection

Partition networks into densely connected groups.

import networkx as nx
from networkx.algorithms import community
import itertools

G = nx.karate_club_graph()

# Greedy modularity maximization
comms_greedy = community.greedy_modularity_communities(G)
mod_score = community.modularity(G, comms_greedy)
print(f"Greedy: {len(comms_greedy)} communities, modularity={mod_score:.4f}")

# Label propagation (fast, non-deterministic)
comms_lpa = community.label_propagation_communities(G)
print(f"Label propagation: {len(list(comms_lpa))} communities")

# Girvan-Newman (hierarchical, edge betweenness removal)
gn = community.girvan_newman(G)
# Get first level of partition
first_level = next(gn)
print(f"Girvan-Newman first split: {len(first_level)} groups")
print(f"  Sizes: {[len(c) for c in first_level]}")

Module 6: I/O and Serialization

import networkx as nx
import pandas as pd
import json

G = nx.karate_club_graph()

# Edge list (simple text format)
nx.write_edgelist(G, "karate.edgelist")
G_loaded = nx.read_edgelist("karate.edgelist", nodetype=int)

# GraphML (preserves all attributes, XML-based)
nx.write_graphml(G, "karate.graphml")
G_xml = nx.read_graphml("karate.graphml")

# JSON (node-link format, web-friendly for d3.js)
data = nx.node_link_data(G)
with open("karate.json", "w") as f:
    json.dump(data, f)

# Pandas integration
df = pd.DataFrame({"source": [1,2,3], "target": [2,3,4], "weight": [0.5,1.0,0.75]})
G_pd = nx.from_pandas_edgelist(df, "source", "target", edge_attr="weight")
df_out = nx.to_pandas_edgelist(G_pd)
print(f"Pandas round-trip: {len(df_out)} edges")

# NumPy/SciPy matrices
A = nx.to_numpy_array(G)
print(f"Adjacency matrix shape: {A.shape}")
A_sparse = nx.to_scipy_sparse_array(G, format="csr")  # Memory-efficient

Module 7: Visualization

import networkx as nx
import matplotlib.pyplot as plt

G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)

# Color by degree, size by betweenness centrality
bc = nx.betweenness_centrality(G)
fig, ax = plt.subplots(figsize=(10, 8))
nx.draw(G, pos=pos, ax=ax,
        node_color=[G.degree(n) for n in G.nodes()], cmap=plt.cm.viridis,
        node_size=[3000 * bc[n] + 100 for n in G.nodes()],
        edge_color="gray", alpha=0.8, with_labels=True, font_size=8)
plt.tight_layout()
plt.savefig("network.png", dpi=300, bbox_inches="tight")
plt.savefig("network.pdf", bbox_inches="tight")  # Vector format
print("Saved network.png and network.pdf")

Module 8: Generators

import networkx as nx

# Erdos-Renyi random graph: n nodes, edge probability p
G_er = nx.erdos_renyi_graph(n=200, p=0.05, seed=42)
print(f"ER: {G_er.number_of_nodes()} nodes, {G_er.number_of_edges()} edges")

# Barabasi-Albert scale-free (power-law degree distribution)
G_ba = nx.barabasi_albert_graph(n=200, m=3, seed=42)

# Watts-Strogatz small-world
G_ws = nx.watts_strogatz_graph(n=200, k=6, p=0.1, seed=42)
print(f"WS clustering: {nx.average_clustering(G_ws):.3f}")

# Stochastic block model (community structure)
sizes, probs = [50, 50, 50], [[0.25,0.05,0.02],[0.05,0.35,0.07],[0.02,0.07,0.40]]
G_sbm = nx.stochastic_block_model(sizes, probs, seed=42)

# Built-in datasets and classic graphs
G_karate = nx.karate_club_graph()       # Zachary's karate club
G_grid = nx.grid_2d_graph(5, 7)         # 2D lattice
G_tree = nx.random_tree(n=50, seed=42)  # Random tree
G_geo = nx.random_geometric_graph(n=100, radius=0.2, seed=42)
# See references/algorithms_generators.md for full generator catalog

Key Concepts

Graph Types

Attribute Patterns

Attributes are stored as dictionaries at graph, node, and edge levels:

import networkx as nx
G = nx.Graph(name="example")              # Graph-level attribute
G.add_node(1, label="hub", weight=1.5)    # Node attributes
G.add_edge(1, 2, weight=0.8, type="ppi")  # Edge attributes

# Bulk set/get
nx.set_node_attributes(G, {1: "red", 2: "blue"}, "color")
colors = nx.get_node_attributes(G, "color")  # {1: 'red', 2: 'blue'}

Layout Algorithms

Common Workflows

Workflow 1: Social Network Analysis

Goal: Identify influential actors, detect communities, and visualize.

import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community

# Step 1: Load network and basic stats
G = nx.karate_club_graph()
print(f"Network: {G.number_of_nodes()} actors, {G.number_of_edges()} ties")
print(f"Density: {nx.density(G):.4f}, Clustering: {nx.average_clustering(G):.4f}")

# Step 2: Identify influential nodes
bc = nx.betweenness_centrality(G)
top_bc = sorted(bc.items(), key=lambda x: x[1], reverse=True)[:5]
print("Top 5 by betweenness:", [(n, f"{s:.3f}") for n, s in top_bc])

# Step 3: Detect communities
comms = community.greedy_modularity_communities(G)
print(f"Communities: {len(comms)}, modularity: {community.modularity(G, comms):.4f}")

# Step 4: Visualize with community coloring
pos = nx.spring_layout(G, seed=42)
fig, ax = plt.subplots(figsize=(10, 8))
for i, comm in enumerate(comms):
    nx.draw_networkx_nodes(G, pos, nodelist=list(comm), ax=ax,
                           node_color=[plt.cm.Set2(i)]*len(comm), node_size=400)
nx.draw_networkx_edges(G, pos, ax=ax, alpha=0.3)
nx.draw_networkx_labels(G, pos, ax=ax, font_size=8)
plt.axis("off")
plt.tight_layout()
plt.savefig("social_network_analysis.png", dpi=300, bbox_inches="tight")
print("Saved social_network_analysis.png")

Workflow 2: Biological Interaction Network

Goal: Build a PPI network from tabular data, analyze topology, and identify hub proteins.

import networkx as nx
import pandas as pd

# Step 1: Load interaction data from DataFrame
interactions = pd.DataFrame({
    "protein_a": ["TP53","TP53","BRCA1","BRCA1","MDM2","ATM","ATM","CHEK2","RB1","CDK2"],
    "protein_b": ["MDM2","BRCA1","ATM","CHEK2","RB1","CHEK2","BRCA2","CDC25A","CDK2","CCNA2"],
    "score": [0.99, 0.95, 0.92, 0.88, 0.91, 0.97, 0.85, 0.90, 0.87, 0.93]
})
G = nx.from_pandas_edgelist(interactions, "protein_a", "protein_b",
                             edge_attr="score")
print(f"PPI network: {G.number_of_nodes()} proteins, {G.number_of_edges()} interactions")

# Step 2: Network statistics
print(f"Connected: {nx.is_connected(G)}")
print(f"Diameter: {nx.diameter(G)}")
print(f"Avg path length: {nx.average_shortest_path_length(G):.2f}")
print(f"Transitivity: {nx.transitivity(G):.4f}")

# Step 3: Hub identification (multiple centrality measures)
degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G)
close_c = nx.closeness_centrality(G)

results = pd.DataFrame({
    "protein": list(G.nodes()),
    "degree_centrality": [degree_c[n] for n in G.nodes()],
    "betweenness": [between_c[n] for n in G.nodes()],
    "closeness": [close_c[n] for n in G.nodes()],
}).sort_values("betweenness", ascending=False)
print("\nHub proteins:")
print(results.head(5).to_string(index=False))

# Step 4: Export for downstream analysis
nx.write_graphml(G, "ppi_network.graphml")
results.to_csv("protein_centrality.csv", index=False)
print("Exported ppi_network.graphml and protein_centrality.csv")

Key Parameters

Best Practices

Always set random seeds for reproducible generators and layouts: seed=42 in both erdos_renyi_graph() and spring_layout().
Use approximate algorithms for large graphs: nx.betweenness_centrality(G, k=500) samples k nodes instead of all pairs.
Prefer from_pandas_edgelist over manual add_edge loops for bulk data loading -- handles attributes cleanly and is faster.
Copy subgraphs before modification: G.subgraph(nodes) returns a read-only view; call .copy() for a mutable independent graph.
Use GraphML or GML for persistent storage to preserve all node/edge attributes. Edge lists lose metadata unless explicitly handled.
Convert graph types explicitly: D.to_undirected() (DiGraph -> Graph), nx.Graph(M) (MultiGraph -> Graph, collapses multi-edges).
Use sparse matrices for large adjacency exports: to_scipy_sparse_array() is far more memory-efficient than to_numpy_array().
Anti-pattern -- Don't use nx.info(): Deprecated; use G.number_of_nodes(), G.number_of_edges(), nx.density(G) directly.
Anti-pattern -- Don't assume node ordering: Algorithms may return results in different orders. Always index by node key, not position.

Common Recipes

Recipe: Minimum Spanning Tree

Extract the minimum spanning tree and compare to the original graph.

import networkx as nx

# Create weighted graph
G = nx.erdos_renyi_graph(50, 0.15, seed=42)
for u, v in G.edges():
    G[u][v]["weight"] = round(nx.utils.py_random_state(42).random(), 2)

mst = nx.minimum_spanning_tree(G, weight="weight")
print(f"Original: {G.number_of_edges()} edges")
print(f"MST: {mst.number_of_edges()} edges")
total_weight = sum(d["weight"] for _, _, d in mst.edges(data=True))
print(f"MST total weight: {total_weight:.2f}")

Recipe: Graph Coloring and Cliques

Find cliques and compute graph coloring.

import networkx as nx

G = nx.karate_club_graph()

# Find all maximal cliques
cliques = list(nx.find_cliques(G))
print(f"Maximal cliques: {len(cliques)}")
largest_clique = max(cliques, key=len)
print(f"Largest clique size: {len(largest_clique)}, nodes: {largest_clique}")

# Greedy graph coloring
coloring = nx.greedy_color(G, strategy="largest_first")
n_colors = max(coloring.values()) + 1
print(f"Chromatic number (greedy upper bound): {n_colors}")

Recipe: DAG and Topological Sort

Build a directed acyclic graph and find execution order.

import networkx as nx

# Task dependency DAG
D = nx.DiGraph()
D.add_edges_from([
    ("download_data", "preprocess"),
    ("download_data", "validate"),
    ("preprocess", "analyze"),
    ("validate", "analyze"),
    ("analyze", "visualize"),
    ("analyze", "report"),
    ("visualize", "report"),
])

print(f"Is DAG: {nx.is_directed_acyclic_graph(D)}")
order = list(nx.topological_sort(D))
print(f"Execution order: {order}")

# Find all paths from start to end
paths = list(nx.all_simple_paths(D, "download_data", "report"))
print(f"Paths to report: {len(paths)}")
for p in paths:
    print(f"  {' -> '.join(p)}")

Troubleshooting

Bundled Resources

Migrated from original entry (STUB: 436-line main file + 2,014 lines across 5 reference files, main/total = 17.8%).

references/algorithms_generators.md

Original file disposition:

algorithms.md (383 lines): Top algorithms relocated to Core API Modules 3-5 + Recipes. Remaining (traversal, cliques, coloring, isomorphism, matching, cycles, trees) -> this reference.
generators.md (378 lines): Core generators relocated to Module 8. Full catalog (classic, lattice, tree, bipartite, degree sequence, operators) -> this reference.

references/io_visualization.md

Original file disposition:

io.md (441 lines): Core formats relocated to Module 6. Remaining formats + format selection guide -> this reference.
visualization.md (529 lines): Basic matplotlib relocated to Module 7. Advanced techniques (Plotly, PyVis, 3D, bipartite, community coloring) -> this reference.

Fully consolidated original file

graph-basics.md (283 lines): Fully consolidated into main SKILL.md. Graph types -> Key Concepts. Node/edge operations, attributes, subgraphs -> Core API Modules 1-2. Diagnostics -> Common Workflows. Memory/float-point considerations -> Best Practices + Troubleshooting. Omitted: nx.info() (deprecated).

Related Skills

torch-geometric-graph-neural-networks -- graph neural networks (GCN, GAT, GraphSAGE) for node/graph classification and link prediction on graph-structured data
matplotlib-scientific-plotting -- advanced figure customization beyond NetworkX's built-in nx.draw
plotly-interactive-plots -- interactive network plots with hover, zoom, and pan
pandas (planned) -- DataFrame operations for preparing edge/node data before graph construction
scipy (planned) -- sparse matrix operations and numerical algorithms used by NetworkX internally

References

NetworkX documentation -- official docs and API reference
NetworkX tutorial -- official getting started guide
NetworkX GitHub -- source code and issue tracker
NetworkX gallery -- example gallery with visualizations
Hagberg, A., Schult, D., & Swart, P. (2008). Exploring network structure, dynamics, and function using NetworkX. SciPy 2008.

Adoption

jaechang-hits/networkx-graph-analysis

$ install --global

Security Scan Results

SKILL.md

NetworkX Graph Analysis

Overview

When to Use

Prerequisites

Quick Start

Core API

Module 1: Graph Creation and Types

Module 2: Node and Edge Operations

Module 3: Graph Analysis (Centrality)

Module 4: Path and Connectivity

Module 5: Community Detection

Module 6: I/O and Serialization

Module 7: Visualization

Module 8: Generators

Key Concepts

Graph Types

Attribute Patterns

Layout Algorithms

Common Workflows

Workflow 1: Social Network Analysis

Workflow 2: Biological Interaction Network

Key Parameters

Best Practices

Common Recipes

Recipe: Minimum Spanning Tree

Recipe: Graph Coloring and Cliques

Recipe: DAG and Topological Sort

Troubleshooting

Bundled Resources

references/algorithms_generators.md

references/io_visualization.md

Fully consolidated original file

Related Skills

References

Related Skills

jaechang-hits/deseq2-differential-expression

jaechang-hits/vcf-variant-filtering

jaechang-hits/snpeff-variant-annotation

jaechang-hits/plink2-gwas-analysis

jaechang-hits/networkx-graph-analysis

$ install --global

Security Scan Results

SKILL.md

NetworkX Graph Analysis

Overview

When to Use

Prerequisites

Quick Start

Core API

Module 1: Graph Creation and Types

Module 2: Node and Edge Operations

Module 3: Graph Analysis (Centrality)

Module 4: Path and Connectivity

Module 5: Community Detection

Module 6: I/O and Serialization

Module 7: Visualization

Module 8: Generators

Key Concepts

Graph Types

Attribute Patterns

Layout Algorithms

Common Workflows

Workflow 1: Social Network Analysis

Workflow 2: Biological Interaction Network

Key Parameters

Best Practices

Common Recipes

Recipe: Minimum Spanning Tree

Recipe: Graph Coloring and Cliques

Recipe: DAG and Topological Sort

Troubleshooting

Bundled Resources

references/algorithms_generators.md

references/io_visualization.md

Fully consolidated original file