external/anthropic-cybersecurity-skills/skills/building-role-mining-for-rbac-optimization/SKILL.md
Apply bottom-up and top-down role mining techniques to discover optimal RBAC roles from existing user-permission assignments, reducing role explosion and enforcing least privilege.
npx skillsauth add seikaikyo/dash-skills building-role-mining-for-rbac-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Role mining is the process of analyzing existing user-permission assignments to discover optimal roles for a Role-Based Access Control (RBAC) system. Organizations accumulate excessive permissions over time through job changes, project assignments, and ad-hoc access grants, leading to "role explosion" where thousands of granular roles exist with significant overlap. Role mining uses data analysis -- including clustering algorithms, formal concept analysis, and graph-based methods -- to consolidate permissions into a minimal set of roles that accurately represent business functions while enforcing least privilege.
| Approach | Description | Best For | |----------|-------------|----------| | Bottom-Up | Analyze existing permissions to discover common patterns | Large datasets with organic permission growth | | Top-Down | Design roles from business requirements and job descriptions | Greenfield RBAC or organizational restructuring | | Hybrid | Combine bottom-up analysis with top-down business validation | Most production environments |
1. Permission Clustering: Group users with similar permission sets using k-means or hierarchical clustering. Users in the same cluster share a common role.
2. Formal Concept Analysis (FCA): Mathematical framework that identifies complete set of concepts (user groups sharing exact permission sets) from a binary user-permission matrix.
3. Graph-Based Mining: Model users and permissions as a bipartite graph, then find dense subgraphs representing candidate roles.
4. Boolean Matrix Decomposition: Decompose the user-permission matrix U into U ≈ R × P where R maps users to roles and P maps roles to permissions.
| Metric | Formula | Target | |--------|---------|--------| | Role Count | Total distinct roles after mining | Minimize | | Coverage | Permissions explained by mined roles / Total permissions | > 95% | | Weighted Structural Complexity (WSC) | Sum of role-user + role-permission assignments | Minimize | | Deviation | Extra permissions not covered by assigned roles | < 5% |
Collect the current access state from all identity sources:
import pandas as pd
import numpy as np
# Load user-permission assignments
# Format: user_id, permission_id (one row per assignment)
assignments = pd.read_csv("user_permissions.csv")
# Create binary user-permission matrix (UPA matrix)
upa_matrix = assignments.pivot_table(
index="user_id",
columns="permission_id",
aggfunc="size",
fill_value=0
)
upa_matrix = (upa_matrix > 0).astype(int)
print(f"Users: {upa_matrix.shape[0]}")
print(f"Permissions: {upa_matrix.shape[1]}")
print(f"Assignments: {assignments.shape[0]}")
print(f"Density: {upa_matrix.values.sum() / upa_matrix.size:.2%}")
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
def find_optimal_clusters(matrix, max_k=50):
"""Find optimal number of roles using silhouette analysis."""
scores = []
for k in range(2, min(max_k, matrix.shape[0])):
clustering = AgglomerativeClustering(
n_clusters=k, metric="jaccard", linkage="average"
)
labels = clustering.fit_predict(matrix)
score = silhouette_score(matrix, labels, metric="jaccard")
scores.append((k, score))
optimal_k = max(scores, key=lambda x: x[1])[0]
return optimal_k, scores
def mine_roles_clustering(upa_matrix, n_clusters):
"""Mine roles using hierarchical clustering on Jaccard distance."""
clustering = AgglomerativeClustering(
n_clusters=n_clusters, metric="jaccard", linkage="average"
)
user_matrix = upa_matrix.values
labels = clustering.fit_predict(user_matrix)
roles = {}
for cluster_id in range(n_clusters):
cluster_users = upa_matrix.index[labels == cluster_id]
cluster_permissions = upa_matrix.loc[cluster_users]
# Core role = permissions held by >80% of cluster members
permission_frequency = cluster_permissions.mean()
core_permissions = permission_frequency[permission_frequency >= 0.8].index.tolist()
roles[f"Role_{cluster_id}"] = {
"permissions": core_permissions,
"user_count": len(cluster_users),
"users": cluster_users.tolist(),
"coverage": permission_frequency[permission_frequency >= 0.8].mean()
}
return roles, labels
def mine_roles_fca(upa_matrix, min_support=3):
"""Mine roles using Formal Concept Analysis (frequent closed itemsets)."""
from itertools import combinations
users = upa_matrix.index.tolist()
permissions = upa_matrix.columns.tolist()
concepts = []
# Find all maximal permission sets shared by at least min_support users
for size in range(len(permissions), 0, -1):
for perm_combo in combinations(permissions, size):
perm_set = set(perm_combo)
# Find users who have ALL permissions in this set
matching_users = []
for user in users:
user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1])
if perm_set.issubset(user_perms):
matching_users.append(user)
if len(matching_users) >= min_support:
# Check if this is a closed concept (no superset with same extent)
is_closed = True
for concept in concepts:
if set(matching_users) == set(concept["users"]) and \
perm_set.issubset(set(concept["permissions"])):
is_closed = False
break
if is_closed:
concepts.append({
"permissions": list(perm_set),
"users": matching_users,
"support": len(matching_users)
})
if len(concepts) > 100: # Limit for performance
break
return concepts
def evaluate_role_set(roles, upa_matrix):
"""Evaluate the quality of a mined role set."""
total_assignments = upa_matrix.values.sum()
covered_assignments = 0
extra_assignments = 0
for role_name, role_data in roles.items():
role_perms = set(role_data["permissions"])
for user in role_data["users"]:
user_perms = set(upa_matrix.columns[upa_matrix.loc[user] == 1])
covered = role_perms.intersection(user_perms)
extra = role_perms - user_perms
covered_assignments += len(covered)
extra_assignments += len(extra)
metrics = {
"total_roles": len(roles),
"total_assignments": total_assignments,
"covered_assignments": covered_assignments,
"coverage_rate": covered_assignments / total_assignments if total_assignments else 0,
"extra_permissions": extra_assignments,
"deviation_rate": extra_assignments / (covered_assignments + extra_assignments) if (covered_assignments + extra_assignments) else 0,
"avg_role_size": np.mean([len(r["permissions"]) for r in roles.values()]),
"avg_users_per_role": np.mean([r["user_count"] for r in roles.values()]),
}
return metrics
After mining candidate roles:
tools
Zero-Knowledge Proofs (ZKPs) allow a prover to demonstrate knowledge of a secret (such as a password or private key) without revealing the secret itself. This skill implements the Schnorr identificati
development
Configure ModSecurity WAF with OWASP Core Rule Set (CRS) for web application logging, tune rules to reduce false positives, analyze audit logs for attack detection, and implement custom SecRules for application-specific threats. The analyst configures SecRuleEngine, SecAuditEngine, and CRS paranoia levels to balance security coverage with operational stability. Activates for requests involving WAF configuration, ModSecurity rule tuning, web application audit logging, or CRS deployment.
development
Build automated alerting for vulnerability remediation SLA breaches with severity-based timelines, escalation workflows, and compliance reporting dashboards.
testing
Vulnerability remediation SLAs define mandatory timeframes for patching or mitigating identified vulnerabilities based on severity, asset criticality, and exploit availability. Effective SLA programs