Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brycewang-stanford/citation-network-builder

Name: citation-network-builder
Author: brycewang-stanford

skills/43-wentorai-research-plugins/skills/tools/knowledge-graph/citation-network-builder/SKILL.md

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research citation-network-builder

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Citation Network Builder

A skill for constructing, analyzing, and visualizing citation networks from academic reference data. Covers data collection from bibliographic databases, network construction using direct citation, co-citation, and bibliographic coupling methods, community detection for identifying research clusters, and practical visualization with tools like Gephi, VOSviewer, and Python NetworkX.

Data Collection and Preparation

Source Databases

Citation network analysis requires structured bibliographic data with reference lists. The choice of database determines coverage and available metadata.

Database Comparison for Citation Analysis:

Web of Science (Clarivate):
  - Format: ISI/WoS plain text, BibTeX, CSV
  - Coverage: ~21,000 journals, back to 1900
  - Strengths: Cited reference data is most complete
  - Limits: 1,000 records per export, subscription required
  - Best for: High-quality citation network analysis

Scopus (Elsevier):
  - Format: CSV, BibTeX, RIS
  - Coverage: ~27,000 journals, back to 1970s for most
  - Strengths: Broader coverage than WoS, author IDs
  - Limits: 2,000 records per export, subscription required
  - Best for: Broader disciplinary coverage

OpenAlex (free):
  - Format: JSON via REST API
  - Coverage: ~250M works, all disciplines
  - Strengths: Free, open, comprehensive, API access
  - Limits: Reference linking less complete than WoS
  - Best for: Large-scale analysis, reproducible research

CrossRef (free):
  - Format: JSON via REST API
  - Coverage: ~150M DOIs across all publishers
  - Strengths: Free, authoritative DOI metadata, reference linking
  - Limits: No abstract text, citation counts may lag
  - Best for: Cross-publisher networks, DOI resolution

Data Cleaning for Network Construction

import pandas as pd

def clean_bibliographic_data(records):
    """
    Clean and deduplicate bibliographic records for network construction.

    Steps:
    1. Standardize DOIs (lowercase, strip prefixes)
    2. Deduplicate by DOI, then by title similarity
    3. Parse reference lists into structured format
    4. Filter records missing key fields
    """
    # Standardize DOIs
    records["doi"] = (
        records["doi"]
        .str.lower()
        .str.replace("https://doi.org/", "", regex=False)
        .str.replace("http://dx.doi.org/", "", regex=False)
        .str.strip()
    )

    # Remove duplicates by DOI
    records = records.drop_duplicates(subset="doi", keep="first")

    # Filter records without references (cannot build citation links)
    records = records[records["references"].notna()]
    records = records[records["references"].str.len() > 0]

    return records

Network Construction Methods

Direct Citation Network

The simplest approach: paper A cites paper B creates a directed edge from A to B.

import networkx as nx

def build_direct_citation_network(records):
    """
    Build a directed citation network.
    Nodes = papers, Edges = citation relationships.

    Args:
        records: DataFrame with 'doi' and 'references' columns
                 where 'references' is a list of cited DOIs
    Returns:
        NetworkX DiGraph
    """
    G = nx.DiGraph()

    for _, row in records.iterrows():
        citing_doi = row["doi"]
        G.add_node(citing_doi, title=row.get("title", ""),
                   year=row.get("year", None))

        for ref_doi in row["references"]:
            G.add_edge(citing_doi, ref_doi)

    return G

Co-Citation Network

Two papers are co-cited when a third paper cites both. Co-citation strength is the number of papers that cite both. This method identifies intellectual relationships between cited works.

from itertools import combinations
from collections import Counter

def build_cocitation_network(records, min_cocitations=2):
    """
    Build an undirected co-citation network.
    Nodes = cited papers, Edges = co-citation frequency.
    """
    pair_counts = Counter()

    for _, row in records.iterrows():
        refs = sorted(set(row["references"]))
        for a, b in combinations(refs, 2):
            pair_counts[(a, b)] += 1

    G = nx.Graph()
    for (a, b), count in pair_counts.items():
        if count >= min_cocitations:
            G.add_edge(a, b, weight=count)

    return G

Bibliographic Coupling Network

Two papers are bibliographically coupled when they share one or more references. This method groups papers with similar theoretical or methodological foundations.

def build_bibliographic_coupling_network(records, min_shared=3):
    """
    Build an undirected bibliographic coupling network.
    Nodes = citing papers, Edges = number of shared references.
    """
    ref_sets = {}
    for _, row in records.iterrows():
        ref_sets[row["doi"]] = set(row["references"])

    G = nx.Graph()
    dois = list(ref_sets.keys())
    for i in range(len(dois)):
        for j in range(i + 1, len(dois)):
            shared = len(ref_sets[dois[i]] & ref_sets[dois[j]])
            if shared >= min_shared:
                G.add_edge(dois[i], dois[j], weight=shared)

    return G

Network Analysis

Key Metrics

Node-level metrics:
  - In-degree (direct citation): number of times a paper is cited
    -> identifies influential papers
  - Betweenness centrality: how often a node lies on shortest paths
    -> identifies bridging papers connecting subfields
  - PageRank: iterative importance score based on who cites the paper
    -> identifies papers cited by other influential papers

Network-level metrics:
  - Density: proportion of possible edges that exist
  - Clustering coefficient: tendency of nodes to form triangles
  - Average path length: mean shortest path between node pairs
  - Number of connected components: isolated clusters

Community Detection

Community detection algorithms identify clusters of densely connected papers, corresponding to research subfields or intellectual traditions.

import community as community_louvain

def detect_communities(G):
    """
    Detect communities using the Louvain algorithm.
    Returns a dictionary mapping node -> community_id.
    """
    partition = community_louvain.best_partition(G, weight="weight")

    # Summarize communities
    communities = {}
    for node, comm_id in partition.items():
        communities.setdefault(comm_id, []).append(node)

    for comm_id, members in sorted(communities.items()):
        print(f"Community {comm_id}: {len(members)} papers")

    return partition

Visualization

Tool Recommendations

Gephi (desktop application):
  - Best for: Interactive exploration of medium networks (1k-50k nodes)
  - Layout algorithms: ForceAtlas2, Fruchterman-Reingold
  - Export: SVG, PDF, PNG
  - Workflow: Import GEXF/GraphML -> layout -> partition by community
              -> adjust sizes by centrality -> export

VOSviewer (desktop application):
  - Best for: Bibliometric networks specifically
  - Direct import from WoS/Scopus export files
  - Built-in clustering and overlay visualizations
  - Limitation: less customizable than Gephi

Python (matplotlib, pyvis):
  - Best for: Reproducible, scriptable visualizations
  - Use pyvis for interactive HTML network graphs
  - Use matplotlib for static publication-quality figures

Citation network analysis provides a quantitative lens on the structure of scientific knowledge, revealing invisible colleges, emerging research fronts, and foundational works that shape entire disciplines.

brycewang-stanford/citation-network-builder

skills/43-wentorai-research-plugins/skills/tools/knowledge-graph/citation-network-builder/SKILL.md

Build and analyze citation networks from academic reference data

1,661 stars

development

Updated Jun 4, 2026

$ install --global

skillsauth

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research citation-network-builder

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 4, 2026, 6:14 AM192.5s1 file scanned

SKILL.md

name:: citation-network-builder
description:: Build and analyze citation networks from academic reference data
emoji:: 🕸️
category:: tools
subcategory:: knowledge-graph
keywords:: ["citation network", "bibliometrics", "graph analysis", "co-citation", "bibliographic coupling", "network visualization"]
source:: wentor-research-plugins

Citation Network Builder

Data Collection and Preparation

Source Databases

Citation network analysis requires structured bibliographic data with reference lists. The choice of database determines coverage and available metadata.

Database Comparison for Citation Analysis:

Web of Science (Clarivate):
  - Format: ISI/WoS plain text, BibTeX, CSV
  - Coverage: ~21,000 journals, back to 1900
  - Strengths: Cited reference data is most complete
  - Limits: 1,000 records per export, subscription required
  - Best for: High-quality citation network analysis

Scopus (Elsevier):
  - Format: CSV, BibTeX, RIS
  - Coverage: ~27,000 journals, back to 1970s for most
  - Strengths: Broader coverage than WoS, author IDs
  - Limits: 2,000 records per export, subscription required
  - Best for: Broader disciplinary coverage

OpenAlex (free):
  - Format: JSON via REST API
  - Coverage: ~250M works, all disciplines
  - Strengths: Free, open, comprehensive, API access
  - Limits: Reference linking less complete than WoS
  - Best for: Large-scale analysis, reproducible research

CrossRef (free):
  - Format: JSON via REST API
  - Coverage: ~150M DOIs across all publishers
  - Strengths: Free, authoritative DOI metadata, reference linking
  - Limits: No abstract text, citation counts may lag
  - Best for: Cross-publisher networks, DOI resolution

Data Cleaning for Network Construction

import pandas as pd

def clean_bibliographic_data(records):
    """
    Clean and deduplicate bibliographic records for network construction.

    Steps:
    1. Standardize DOIs (lowercase, strip prefixes)
    2. Deduplicate by DOI, then by title similarity
    3. Parse reference lists into structured format
    4. Filter records missing key fields
    """
    # Standardize DOIs
    records["doi"] = (
        records["doi"]
        .str.lower()
        .str.replace("https://doi.org/", "", regex=False)
        .str.replace("http://dx.doi.org/", "", regex=False)
        .str.strip()
    )

    # Remove duplicates by DOI
    records = records.drop_duplicates(subset="doi", keep="first")

    # Filter records without references (cannot build citation links)
    records = records[records["references"].notna()]
    records = records[records["references"].str.len() > 0]

    return records

Network Construction Methods

Direct Citation Network

The simplest approach: paper A cites paper B creates a directed edge from A to B.

import networkx as nx

def build_direct_citation_network(records):
    """
    Build a directed citation network.
    Nodes = papers, Edges = citation relationships.

    Args:
        records: DataFrame with 'doi' and 'references' columns
                 where 'references' is a list of cited DOIs
    Returns:
        NetworkX DiGraph
    """
    G = nx.DiGraph()

    for _, row in records.iterrows():
        citing_doi = row["doi"]
        G.add_node(citing_doi, title=row.get("title", ""),
                   year=row.get("year", None))

        for ref_doi in row["references"]:
            G.add_edge(citing_doi, ref_doi)

    return G

Co-Citation Network

Two papers are co-cited when a third paper cites both. Co-citation strength is the number of papers that cite both. This method identifies intellectual relationships between cited works.

from itertools import combinations
from collections import Counter

def build_cocitation_network(records, min_cocitations=2):
    """
    Build an undirected co-citation network.
    Nodes = cited papers, Edges = co-citation frequency.
    """
    pair_counts = Counter()

    for _, row in records.iterrows():
        refs = sorted(set(row["references"]))
        for a, b in combinations(refs, 2):
            pair_counts[(a, b)] += 1

    G = nx.Graph()
    for (a, b), count in pair_counts.items():
        if count >= min_cocitations:
            G.add_edge(a, b, weight=count)

    return G

Bibliographic Coupling Network

Two papers are bibliographically coupled when they share one or more references. This method groups papers with similar theoretical or methodological foundations.

def build_bibliographic_coupling_network(records, min_shared=3):
    """
    Build an undirected bibliographic coupling network.
    Nodes = citing papers, Edges = number of shared references.
    """
    ref_sets = {}
    for _, row in records.iterrows():
        ref_sets[row["doi"]] = set(row["references"])

    G = nx.Graph()
    dois = list(ref_sets.keys())
    for i in range(len(dois)):
        for j in range(i + 1, len(dois)):
            shared = len(ref_sets[dois[i]] & ref_sets[dois[j]])
            if shared >= min_shared:
                G.add_edge(dois[i], dois[j], weight=shared)

    return G

Network Analysis

Key Metrics

Node-level metrics:
  - In-degree (direct citation): number of times a paper is cited
    -> identifies influential papers
  - Betweenness centrality: how often a node lies on shortest paths
    -> identifies bridging papers connecting subfields
  - PageRank: iterative importance score based on who cites the paper
    -> identifies papers cited by other influential papers

Network-level metrics:
  - Density: proportion of possible edges that exist
  - Clustering coefficient: tendency of nodes to form triangles
  - Average path length: mean shortest path between node pairs
  - Number of connected components: isolated clusters

Community Detection

Community detection algorithms identify clusters of densely connected papers, corresponding to research subfields or intellectual traditions.

import community as community_louvain

def detect_communities(G):
    """
    Detect communities using the Louvain algorithm.
    Returns a dictionary mapping node -> community_id.
    """
    partition = community_louvain.best_partition(G, weight="weight")

    # Summarize communities
    communities = {}
    for node, comm_id in partition.items():
        communities.setdefault(comm_id, []).append(node)

    for comm_id, members in sorted(communities.items()):
        print(f"Community {comm_id}: {len(members)} papers")

    return partition

Visualization

Tool Recommendations

Gephi (desktop application):
  - Best for: Interactive exploration of medium networks (1k-50k nodes)
  - Layout algorithms: ForceAtlas2, Fruchterman-Reingold
  - Export: SVG, PDF, PNG
  - Workflow: Import GEXF/GraphML -> layout -> partition by community
              -> adjust sizes by centrality -> export

VOSviewer (desktop application):
  - Best for: Bibliometric networks specifically
  - Direct import from WoS/Scopus export files
  - Built-in clustering and overlay visualizations
  - Limitation: less customizable than Gephi

Python (matplotlib, pyvis):
  - Best for: Reproducible, scriptable visualizations
  - Use pyvis for interactive HTML network graphs
  - Use matplotlib for static publication-quality figures

Related Skills

brycewang-stanford/literature-review-tools

tools

VerifiedTrustedCommunity

Recommend AND run open-source AI tools, agents, Claude Code / Codex skills, and MCP servers for any stage of a literature review — searching, reading, extracting, synthesizing, screening, citation-checking, and paper writing. Use when the user asks "what tool should I use to..." OR "install/run/use <tool> to ..." for research/lit-review work: automating a survey or related-work section, PDF→Markdown extraction for LLMs (MinerU/marker/docling), PRISMA / systematic review (ASReview), citation-backed Q&A over PDFs (PaperQA2), wiring papers into Claude/Cursor via MCP (arxiv/paper-search/zotero servers), or chatting with a Zotero library. Ships a launcher (scripts/litrun.py) that installs each tool in an isolated venv and runs it. Curated catalog of 70+ vetted projects. 支持中英文（用于「文献综述工具选型」与「一键安装/运行」）。

3,109SKILL.mdUpdated Jul 28, 2026

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

development

VerifiedTrustedCommunity

Route empirical-research requests through the Auto-Empirical Research Skills catalog when this whole repository is installed as one skill in Codex, CodeBuddy, Claude Code, or another IDE. Use to choose and load the right vendored AERS skill for causal inference, econometrics, replication, data acquisition, manuscript writing, peer review and referee responses, citation checking, de-AIGC editing, or full empirical-paper workflows without reading the entire repository at once.

3,109SKILL.mdUpdated Jun 27, 2026

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

documentation

VerifiedTrustedCommunity

Use when the project collects primary data or runs a field, lab, or survey experiment, before the intervention begins — write the pre-analysis plan, size the sample from a power calculation, and register with the AEA RCT Registry. Apply after the design is chosen in aer-identification and before any outcome data are seen.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

tools

VerifiedTrustedCommunity

Guide economists to authoritative data sources with explicit, confirmed data specifications before retrieval; interfaces with Playwright MCP to navigate portals and extract real data, not articles about data.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/economist-data-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

# Copy into Claude Code skills folder (global)
cp -r Awesome-Agent-Skills-for-Empirical-Research/skills/43-wentorai-research-plugins/skills/tools/knowledge-graph/citation-network-builder ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

1,661 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT