examples/nvidia_deep_agent/skills/cudf-analytics/SKILL.md
Use for GPU-accelerated data analysis on datasets, CSVs, or tabular data using NVIDIA cuDF. Triggers when tasks involve groupby aggregations, statistical summaries, anomaly detection, or large-scale data profiling.
npx skillsauth add langchain-ai/deepagents cudf-analyticsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
GPU-accelerated data analysis using NVIDIA RAPIDS cuDF. cuDF provides a pandas-like API that runs on NVIDIA GPUs, enabling massive speedups on large datasets.
Use this skill when:
Always start every script with this boilerplate. It tests actual GPU operations, not just import.
import pandas as pd
try:
import cudf
# Smoke-test: verify GPU compute AND host transfer both work
_test = cudf.Series([1, 2, 3])
assert _test.sum() == 6
assert _test.to_pandas().tolist() == [1, 2, 3]
GPU = True
except Exception as e:
print(f"[GPU] cudf unavailable, falling back to pandas: {e}")
GPU = False
def read_csv(path):
return cudf.read_csv(path) if GPU else pd.read_csv(path)
def to_pd(df):
"""Convert cuDF DataFrame/Series to pandas. Use this instead of .to_pandas() directly."""
if not GPU:
return df
try:
return df.to_pandas()
except Exception as e:
print(f"[GPU] .to_pandas() failed, using Arrow fallback: {e}")
return df.to_arrow().to_pandas()
cuDF mirrors the pandas API. Common operations:
df = read_csv("data.csv")
# Use to_pd() when you need pandas output
summary = to_pd(df[["value", "score"]].describe())
# Scalar values work directly with float()
mean_val = float(df["value"].mean())
q1 = float(df["value"].quantile(0.25))
# Correlation
corr = float(df["value"].corr(df["score"]))
result = df.groupby("category").agg({
"revenue": ["sum", "mean", "count"],
"quantity": ["sum", "mean"],
})
result_pd = to_pd(result)
col = "value"
Q1 = float(df[col].quantile(0.25))
Q3 = float(df[col].quantile(0.75))
IQR = Q3 - Q1
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
outliers = to_pd(df[(df[col] < lower) | (df[col] > upper)])
mean = float(df[col].mean())
std = float(df[col].std())
df["z_score"] = (df[col] - mean) / std
anomalies = to_pd(df[df["z_score"].abs() > 3])
# Filter rows
filtered = df[df["status"] == "active"]
# Select columns
subset = df[["name", "revenue", "date"]]
# Sort
sorted_df = df.sort_values("revenue", ascending=False)
# Convert to pandas for final output / iteration
result_pd = to_pd(sorted_df)
cuDF requires explicit type specification for optimal performance:
float32 or float64 for numeric dataint32 or int64 for integer dataWhen reporting analysis results:
testing
Review the current conversation and capture valuable knowledge — best practices, coding conventions, architecture decisions, workflows, and user feedback — into persistent memory (AGENTS.md) or reusable skills. Use when the user says: (1) remember this, (2) save what we learned, (3) update memory, (4) capture learnings.
data-ai
Lists tables, describes columns and data types, identifies foreign key relationships, and maps entity relationships in a database. Use when the user asks about database schema, table structure, column types, what tables exist, ERD, foreign keys, or how entities relate.
documentation
Writes and executes SQL queries from simple SELECTs to complex multi-table JOINs, aggregations, and subqueries. Use when the user asks to query a database, write SQL, run a SELECT statement, retrieve data, filter records, or generate reports from database tables.
documentation
Use when processing large PDFs, document collections, or bulk text extraction tasks that benefit from GPU-accelerated processing. Triggers when the user provides large documents or needs bulk document analysis.