src/dot-agents/skills/python-polars/SKILL.md
Enforces Polars over Pandas for functional pipe-style data manipulation (like dplyr in R). Use when writing Python data processing code, data transformation pipelines, ETL workflows, or analytical queries—e.g., "process this CSV", "aggregate sales data", "filter and transform DataFrame", "group by and calculate metrics".
npx skillsauth add jjjermiah/dotagents python-polarsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Enforce Polars as the default for all Python data manipulation. Polars provides a functional, pipe-style API similar to dplyr in R—code reads as a clear series of composable transformations rather than imperative steps. This produces more readable, maintainable, and performant data pipelines.
YOU MUST use Polars for ALL data manipulation in Python. No exceptions for new code.
IMMEDIATELY upon loading this skill:
%>%)If you MUST use pandas (ML ecosystem compatibility only):
df_polars.to_pandas())Polars represents the modern functional approach to data manipulation—like dplyr in R, it emphasizes readable pipelines over imperative code. Legacy pandas patterns (in-place mutation, index manipulation, row iteration) are technical debt.
| Principle | Polars (Good) | Pandas (Bad) |
|-----------|---------------|--------------|
| Method chains | Continuous pipeline | Breaking into separate statements |
| Readability | Each step clearly named | Mental state tracking required |
| Expressions | pl.col("x") * 2 | df["x"] * 2 (implicit) |
| Immutability | df.with_columns(...) returns new | df["x"] = y mutates in place |
| Functional | Data flows through transformations | Imperative step-by-step |
GOOD - Clear pipe-style chain:
import polars as pl
result = (
pl.scan_csv("sales.csv") # 1. Start with data source
.filter(pl.col("date") >= "2024-01-01") # 2. Filter to relevant rows
.with_columns( # 3. Add computed columns
revenue=pl.col("units") * pl.col("price"),
is_weekend=pl.col("date").dt.weekday().is_in([6, 7]),
)
.group_by(["region", "is_weekend"]) # 4. Group for aggregation
.agg( # 5. Calculate metrics
total_revenue=pl.col("revenue").sum(),
order_count=pl.col("order_id").count(),
avg_order=pl.col("revenue").mean(),
)
.filter(pl.col("order_count") > 10) # 6. Filter aggregated results
.sort(["region", "total_revenue"], descending=[False, True])
.collect() # 7. Execute pipeline
)
BAD - Broken into imperative steps:
import pandas as pd # WRONG - using pandas
df = pd.read_csv("sales.csv") # Eager load
df = df[df["date"] >= "2024-01-01"] # Filter
df["revenue"] = df["units"] * df["price"] # Mutate
df["is_weekend"] = df["date"].dt.weekday.isin([6, 7]) # More mutation
result = df.groupby(["region", "is_weekend"]).agg({ # Group and aggregate
"revenue": ["sum", "mean", "count"]
})
pl.scan_csv(), pl.scan_parquet() for large datapl.col("name") over string indexing.with_columns()context7_query-docs(libraryId="/pola-rs/polars", query="...")iter_rows() - use vectorized expressionsapply() - use native Polars expressions| Operation | Pandas (Imperative) | Polars (Pipe-Style) |
|-----------|---------------------|---------------------|
| Read CSV (large) | pd.read_csv() then filter | pl.scan_csv().filter().collect() |
| Select columns | df[["a", "b"]] | df.select("a", "b") |
| Filter rows | df[df.a > 10] | df.filter(pl.col("a") > 10) |
| Add column | df["c"] = df.a + df.b | df.with_columns(c=pl.col("a") + pl.col("b")) |
| Group by + agg | df.groupby("x").y.sum() | df.group_by("x").agg(pl.col("y").sum()) |
| Window/rank | df.groupby("x").y.rank() | df.with_columns(pl.col("y").rank().over("x")) |
| Conditional | np.where(df.a > 10, "high", "low") | pl.when(pl.col("a") > 10).then("high").otherwise("low") |
| Join | df1.merge(df2, on="id") | df1.join(df2, on="id") |
| Sort | df.sort_values("a") | df.sort("a") |
| Drop duplicates | df.drop_duplicates() | df.unique() |
| Missing values | df.fillna(0) | df.fill_null(0) |
Master these patterns for pipe-style code:
# Column reference and arithmetic
pl.col("revenue") / pl.col("units")
# Conditional logic (CASE WHEN equivalent)
pl.when(pl.col("age") >= 18).then("adult").otherwise("minor")
# String operations
pl.col("name").str.to_uppercase()
pl.col("email").str.contains("@")
# Date/time operations
pl.col("timestamp").dt.year()
pl.col("date").dt.truncate("1d")
# Aggregations (use in .agg() context)
pl.col("value").sum()
pl.col("value").mean()
pl.col("id").n_unique()
# Window functions (use .over() for group transforms)
pl.col("value").sum().over("category") # Category total per row
pl.col("value").rank().over("category") # Rank within category
YOU MUST query Context7 for Polars APIs before writing implementation code.
# Resolve library ID first
context7_resolve-library-id(libraryName="polars", query="Polars DataFrame library")
# Then query for specific operations
context7_query-docs(libraryId="/pola-rs/polars", query="lazy scan filter example")
Context7 provides current, accurate API documentation. Pre-trained knowledge of Polars APIs will be stale and cause bugs.
development
Guides creation, validation, and packaging of AI agent skills with token-efficient design, progressive disclosure patterns, and YAML frontmatter best practices. Use when building new skills, updating existing skills, validating skill structure against standards, or packaging for distribution—e.g., "create skill", "validate SKILL.md", "package skill for sharing", "check description format".
tools
Investigate and integrate weakly documented SDK/library modules (especially Azure SDKs) into code. Use when asked to "investigate module", "SDK", "client class", or when docs are missing/weak and you need to discover APIs, models, or usage patterns to implement integration.
tools
Write production-ready one-off scripts and automation utilities with proper error handling and safety patterns. Use when developing bash automation, Python CLI tools, shell scripts, system administration scripts, or command-line batch processing—e.g., "write a script to process files", "python one-liner for data conversion", "bash automation for backups", "shell script with error handling".
development
R package testing with testthat 3rd edition. Use when writing R tests, fixing failing tests, debugging errors, or reviewing coverage—e.g., "write testthat tests", "fix failing R tests", "snapshot testing", "test coverage".