Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

Elmanda1/Add New Engine

Name: Add New Engine
Author: Elmanda1

.agents/skills/add_engine/SKILL.md

npx skillsauth add Elmanda1/nexus_datagen Add New Engine

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

How to Add a New Engine

This skill documents the complete process for adding a new data source engine to DataPipeline OS, following the established generator pattern and pluggable architecture.

Prerequisites

Understanding of the unified 5-column output schema: keyword, source, date, content, engagement_score
The new data source must be accessible via API, RSS, or web scraping

Steps

1. Create the Engine File

Create engines/<source>_engine.py using this template:

"""
engines/<source>_engine.py

<Source Name> scraper.
- Generator pattern: chunk → yield → clear() — RAM flat O(chunk_size)
- Dual mode: set <ENV_VAR> in .env for live, blank = simulation
"""

import os
import time
import random
from typing import Generator, List


class <Source>Engine:

    SAFE_COLUMNS = {"keyword", "source", "date", "content", "engagement_score"}

    def __init__(
        self,
        keywords:   List[str],
        start_date: str,
        end_date:   str,
        language:   str = "id",
        chunk_size: int = 500,
    ):
        self.keywords   = keywords
        self.start_date = start_date
        self.end_date   = end_date
        self.language   = language
        self.chunk_size = chunk_size
        self.all_data:  List[dict] = []

    def fetch(self) -> Generator[List[dict], None, None]:
        api_key = os.getenv("<ENV_VAR>", "").strip()
        if api_key:
            yield from self._fetch_live(api_key)
        else:
            yield from self._fetch_simulation()

    def _fetch_live(self, api_key: str) -> Generator[List[dict], None, None]:
        buffer: List[dict] = []

        for keyword in self.keywords:
            # ... fetch data from API ...
            for item in api_results:
                buffer.append({
                    "keyword":          keyword,
                    "source":           "<source_name>",
                    "date":             "YYYY-MM-DD",
                    "content":          "text content here",
                    "engagement_score": 0,
                })
                if len(buffer) >= self.chunk_size:
                    self.all_data.extend(buffer)
                    yield buffer
                    buffer.clear()

            if buffer:
                self.all_data.extend(buffer)
                yield buffer
                buffer.clear()

    def _fetch_simulation(self) -> Generator[List[dict], None, None]:
        for keyword in self.keywords:
            for _ in range(random.randint(3, 7)):
                buffer: List[dict] = []
                size = random.randint(int(self.chunk_size * 0.5), self.chunk_size)
                for _ in range(size):
                    buffer.append({
                        "keyword":          keyword,
                        "source":           "<source_name>",
                        "date":             f"2023-{random.randint(1,12):02d}-{random.randint(1,28):02d}",
                        "content":          f"Simulated content about {keyword}",
                        "engagement_score": random.randint(0, 10000),
                    })
                self.all_data.extend(buffer)
                time.sleep(0.15)
                yield buffer
                buffer.clear()

2. Register in `app.py`

Add the engine to the ENGINE_REGISTRY list in app.py:

from engines.<source>_engine import <Source>Engine

ENGINE_REGISTRY = [
    # ... existing engines ...
    ("<source_name>", <Source>Engine, "<eng_key>", "<4CHAR>"),
]

And add the engine state in pipeline_state["engines"]:

"<eng_key>": {"status": "idle", "rows": 0, "ram_mb": 0},

3. Update Frontend

In frontend/templates/index.html:

Add a checkbox in the platform selection form with value="<source_name>"
Add an engine row in the monitor panel with corresponding data-engine="<eng_key>"

4. Update Environment

Add the API key variable to .env.example with instructions
Add any new pip dependencies to requirements.txt

5. Test

Run with simulation mode first (no API key)
Verify output CSV has correct 5-column schema
Set API key in .env and test live mode
Verify generator pattern works (RAM stays flat)

Critical Rules

Always use generator pattern — yield buffer then buffer.clear()
Always accumulate to self.all_data — needed by schema mapper
Only output 5 columns — extra columns will be stripped by cleaner
Include simulation fallback — engine must work without API keys
Source name in output must be unique and recognizable

Elmanda1/Add New Engine

.agents/skills/add_engine/SKILL.md

Step-by-step guide to add a new data source engine to DataPipeline OS

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add Elmanda1/nexus_datagen Add New Engine

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:33 PM20.0s1 file scanned

SKILL.md

name:: Add New Engine
description:: Step-by-step guide to add a new data source engine to DataPipeline OS

How to Add a New Engine

This skill documents the complete process for adding a new data source engine to DataPipeline OS, following the established generator pattern and pluggable architecture.

Prerequisites

Understanding of the unified 5-column output schema: keyword, source, date, content, engagement_score
The new data source must be accessible via API, RSS, or web scraping

Steps

1. Create the Engine File

Create engines/<source>_engine.py using this template:

"""
engines/<source>_engine.py

<Source Name> scraper.
- Generator pattern: chunk → yield → clear() — RAM flat O(chunk_size)
- Dual mode: set <ENV_VAR> in .env for live, blank = simulation
"""

import os
import time
import random
from typing import Generator, List


class <Source>Engine:

    SAFE_COLUMNS = {"keyword", "source", "date", "content", "engagement_score"}

    def __init__(
        self,
        keywords:   List[str],
        start_date: str,
        end_date:   str,
        language:   str = "id",
        chunk_size: int = 500,
    ):
        self.keywords   = keywords
        self.start_date = start_date
        self.end_date   = end_date
        self.language   = language
        self.chunk_size = chunk_size
        self.all_data:  List[dict] = []

    def fetch(self) -> Generator[List[dict], None, None]:
        api_key = os.getenv("<ENV_VAR>", "").strip()
        if api_key:
            yield from self._fetch_live(api_key)
        else:
            yield from self._fetch_simulation()

    def _fetch_live(self, api_key: str) -> Generator[List[dict], None, None]:
        buffer: List[dict] = []

        for keyword in self.keywords:
            # ... fetch data from API ...
            for item in api_results:
                buffer.append({
                    "keyword":          keyword,
                    "source":           "<source_name>",
                    "date":             "YYYY-MM-DD",
                    "content":          "text content here",
                    "engagement_score": 0,
                })
                if len(buffer) >= self.chunk_size:
                    self.all_data.extend(buffer)
                    yield buffer
                    buffer.clear()

            if buffer:
                self.all_data.extend(buffer)
                yield buffer
                buffer.clear()

    def _fetch_simulation(self) -> Generator[List[dict], None, None]:
        for keyword in self.keywords:
            for _ in range(random.randint(3, 7)):
                buffer: List[dict] = []
                size = random.randint(int(self.chunk_size * 0.5), self.chunk_size)
                for _ in range(size):
                    buffer.append({
                        "keyword":          keyword,
                        "source":           "<source_name>",
                        "date":             f"2023-{random.randint(1,12):02d}-{random.randint(1,28):02d}",
                        "content":          f"Simulated content about {keyword}",
                        "engagement_score": random.randint(0, 10000),
                    })
                self.all_data.extend(buffer)
                time.sleep(0.15)
                yield buffer
                buffer.clear()

2. Register in `app.py`

Add the engine to the ENGINE_REGISTRY list in app.py:

from engines.<source>_engine import <Source>Engine

ENGINE_REGISTRY = [
    # ... existing engines ...
    ("<source_name>", <Source>Engine, "<eng_key>", "<4CHAR>"),
]

And add the engine state in pipeline_state["engines"]:

"<eng_key>": {"status": "idle", "rows": 0, "ram_mb": 0},

3. Update Frontend

In frontend/templates/index.html:

Add a checkbox in the platform selection form with value="<source_name>"
Add an engine row in the monitor panel with corresponding data-engine="<eng_key>"

4. Update Environment

Add the API key variable to .env.example with instructions
Add any new pip dependencies to requirements.txt

5. Test

Run with simulation mode first (no API key)
Verify output CSV has correct 5-column schema
Set API key in .env and test live mode
Verify generator pattern works (RAM stays flat)

Critical Rules

Always use generator pattern — yield buffer then buffer.clear()
Always accumulate to self.all_data — needed by schema mapper
Only output 5 columns — extra columns will be stripped by cleaner
Include simulation fallback — engine must work without API keys
Source name in output must be unique and recognizable

Related Skills

Elmanda1/Troubleshoot DataPipeline OS

development

VerifiedTrustedCommunity

Diagnose and fix common issues in DataPipeline OS

SKILL.mdUpdated Apr 16, 2026

Elmanda1/Troubleshoot DataPipeline OS

Elmanda1/Run Pipeline

development

VerifiedTrustedCommunity

How to set up and run the DataPipeline OS extraction pipeline

SKILL.mdUpdated Apr 16, 2026

Elmanda1/Run Pipeline

openclaw/openclaw-secret-scanning-maintainer

development

VerifiedTrustedCommunity

Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.

357,764SKILL.mdUpdated Apr 15, 2026

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

development

VerifiedTrustedCommunity

Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.

357,764SKILL.mdUpdated Apr 10, 2026

openclaw/openclaw-release-maintainer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/Elmanda1/nexus_datagen.git

# Copy into Claude Code skills folder (global)
cp -r nexus_datagen/.agents/skills/add_engine ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

Elmanda1/nexus_datagen

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

Elmanda1/Add New Engine

$ install --global

Security Scan Results

SKILL.md

How to Add a New Engine

Prerequisites

Steps

1. Create the Engine File

2. Register in app.py

3. Update Frontend

4. Update Environment

5. Test

Critical Rules

Related Skills

Elmanda1/Troubleshoot DataPipeline OS

Elmanda1/Run Pipeline

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

Elmanda1/Add New Engine

$ install --global

Security Scan Results

SKILL.md

How to Add a New Engine

Prerequisites

Steps

1. Create the Engine File

2. Register in app.py

3. Update Frontend

4. Update Environment

5. Test

Critical Rules

Related Skills

Elmanda1/Troubleshoot DataPipeline OS

Elmanda1/Run Pipeline

openclaw/openclaw-secret-scanning-maintainer

openclaw/openclaw-release-maintainer

2. Register in `app.py`

2. Register in `app.py`