skills/data-engineering/data-quality/SKILL.md
Ensures data accuracy, completeness, and consistency via validation, cleaning, and monitoring in data pipelines.
npx skillsauth add alphaonedev/openclaw-graph data-qualityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill ensures data accuracy, completeness, and consistency in pipelines by performing validation, cleaning, and monitoring. It integrates with data engineering workflows to catch issues early, using tools like rule-based checks and automated alerts.
Use this skill when building or maintaining data pipelines that handle large datasets, such as ETL processes, to prevent downstream errors. Apply it for compliance checks in regulated industries or when integrating new data sources that might introduce inconsistencies. Avoid it for simple data tasks without validation needs.
Invoke this skill via CLI for quick checks or integrate it into Python scripts for automated pipelines. Always load configuration files first, then run validations. For monitoring, set up recurring jobs. Example pattern: Load data, apply validation, clean if needed, and monitor outputs.
Use the OpenClaw CLI for direct execution or the REST API for programmatic access. Authentication requires setting the environment variable $OPENCLAW_API_KEY before running commands.
CLI Command: Validate a file with rules
openclaw data-quality validate --file data.csv --rules config.json --output report.txt
This checks data against rules in config.json and saves results to report.txt.
CLI Command: Clean and monitor data
openclaw data-quality clean --input data.parquet --rules clean_rules.json --monitor
Applies cleaning rules and enables monitoring for ongoing checks.
API Endpoint: POST /api/data-quality/validate
Send a JSON body like:
{ "file_url": "s3://bucket/data.csv", "rules": { "columns": ["id", "name"], "checks": ["no_nulls"] } }
Response: JSON with status and errors, e.g., { "status": "success", "errors": [] }.
API Endpoint: GET /api/data-quality/monitor/{job_id}
Fetch monitoring status:
curl -H "Authorization: Bearer $OPENCLAW_API_KEY" https://api.openclaw.ai/api/data-quality/monitor/12345
Returns metrics like error rate.
Config format: Use JSON for rules, e.g.,
{ "checks": [ { "type": "null_check", "column": "age", "action": "alert" } ] }
Integrate with Python using the OpenClaw SDK: Import and initialize with your API key. For example, in a data pipeline:
import openclaw
client = openclaw.Client(api_key=os.environ['OPENCLAW_API_KEY'])
result = client.validate_data(file_path='data.csv', rules={'checks': ['duplicates']})
Ensure data pipelines handle asynchronous API calls by checking response status codes. For cloud integration, configure webhooks for alerts, e.g., POST to your endpoint on validation failures. Always test integrations in a staging environment first.
Handle errors by checking return codes from CLI or API responses. For CLI, if a command fails, it exits with a non-zero code; parse stderr for details. In code, catch exceptions like:
try:
result = client.validate_data(...)
except openclaw.APIError as e:
log_error(e.message) # e.message contains error details
Common errors include authentication failures (e.g., 401 Unauthorized) or invalid configs (e.g., 400 Bad Request). Use retry logic for transient issues, like network errors, with exponential backoff.
tools
Root web development: project structure, tooling selection, deployment decisions
development
WebAssembly: Rust/Go/C to WASM, wasm-bindgen, Emscripten, WASM Component Model
development
Vue 3: Composition API script setup, Pinia, Vue Router 4, SFCs, Vite, Nuxt 3
tools
Tailwind CSS 4: utility classes, config, JIT, arbitrary values, darkMode, plugins, shadcn/ui