skills/utility/text-metrics/SKILL.md
Probability-based text analysis providing GLTR token rank histograms, DetectGPT curvature probes, and Coh-Metrix-inspired cohesion metrics. Designed to compose with ai-check for comprehensive AI writing pattern detection.
npx skillsauth add astoreyai/claude-skills text-metricsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A utility skill providing advanced statistical and model-based text analysis for AI detection. Implements three core capabilities grounded in peer-reviewed research:
This skill is designed to compose with the ai-check skill to provide Dimension 5 (Probability-Based) detection features. It can also be used standalone for text analysis research.
This skill is typically invoked by other skills (particularly ai-check) rather than directly by users. It may be invoked when:
Reference: Gehrmann, S., Strobelt, H., & Rush, A. M. (2019). "GLTR: Statistical Detection and Visualization of Generated Text." ACL 2019.
Principle: LLM-generated text tends to select higher-probability tokens (lower ranks) more consistently than human writing, which exhibits more variability and surprisal.
Method:
Typical Patterns:
Reference: Mitchell, E., et al. (2023). "DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature." ICML 2023.
Principle: Generated text sits at local maxima in the model's probability distribution, while human text does not. Random perturbations of generated text tend to have lower probability.
Method:
log P(x)x'₁, x'₂, ..., x'ₙmean(log P(x'ᵢ))log P(x) - mean(log P(x'ᵢ))Interpretation:
Reference: McNamara, D. S., et al. (2014). "Automated Evaluation of Text and Discourse with Coh-Metrix." Cambridge University Press.
Principle: Discourse cohesion patterns (connectives, referential overlap, lexical diversity) differ between human and AI writing.
Metrics:
token_rank_hist(text, model_name="gpt2")Compute GLTR-style token rank histogram.
Input:
text: String to analyzemodel_name: HuggingFace model identifier (default: "gpt2")Output:
{
"top10_pct": 45.2,
"top100_pct": 87.3,
"top1000_pct": 96.1,
"rest_pct": 3.9,
"mean_rank": 182.4,
"median_rank": 34.0
}
Usage:
from scripts.text_metrics import token_rank_hist
result = token_rank_hist("Your text here", model_name="gpt2")
print(f"Top-10 concentration: {result['top10_pct']:.1f}%")
detectgpt_score(text, model_name="gpt2", num_perturbations=10)Compute DetectGPT curvature criterion.
Input:
text: String to analyzemodel_name: HuggingFace model (default: "gpt2")num_perturbations: Number of random perturbations (default: 10)Output:
{
"original_logprob": -2.34,
"mean_perturbed_logprob": -3.12,
"curvature": 0.78
}
Usage:
from scripts.text_metrics import detectgpt_score
result = detectgpt_score("Your text here", num_perturbations=20)
if result['curvature'] > 0.5:
print("Likely AI-generated")
cohesion_bundle(text)Compute Coh-Metrix-inspired cohesion metrics.
Input:
text: String to analyzeOutput:
{
"connectives": {
"additive_rate": 12.5,
"temporal_rate": 8.3,
"causal_rate": 6.2,
"adversative_rate": 9.1,
"total_rate": 36.1
},
"lexical_diversity": {
"type_token_ratio": 0.62,
"unique_lemmas": 142,
"hapax_legomena": 78
},
"referential_cohesion": {
"pronoun_rate": 45.2,
"unique_pronoun_types": 12
}
}
Usage:
from scripts.text_metrics import cohesion_bundle
result = cohesion_bundle("Your text here")
diversity = result['lexical_diversity']['type_token_ratio']
print(f"Lexical diversity: {diversity:.2f}")
full_analysis(text, model_name="gpt2", num_perturbations=10)Run all three analyses in one call.
Output: Combined dictionary with all metrics.
When ai-check skill invokes text-metrics:
# ai-check calls text-metrics for Dimension 5
token_data = invoke_skill("text-metrics", "token_rank_hist", text)
curvature = invoke_skill("text-metrics", "detectgpt_score", text)
cohesion = invoke_skill("text-metrics", "cohesion_bundle", text)
# Use results in probability dimension scoring
if token_data['top10_pct'] > 40:
flag_high_probability_concentration()
if curvature['curvature'] > 0.5:
flag_curvature_anomaly()
Recommendation: Use gpt2 for speed, gpt2-medium for accuracy.
Text-metrics benefits significantly from GPU:
Install PyTorch with CUDA support for GPU acceleration.
| Function | Time (CPU) | Time (GPU) | Memory |
|----------|-----------|-----------|---------|
| token_rank_hist | 3-5s | 0.5-1s | ~1GB |
| detectgpt_score | 5-10s | 1-2s | ~1GB |
| cohesion_bundle | <0.1s | <0.1s | <100MB |
| full_analysis | 8-15s | 1.5-3s | ~1GB |
Times for ~500 token documents, gpt2 model.
Model Dependency: Results depend on choice of language model. Mismatch between evaluation model and generation model affects accuracy.
Short Text: DetectGPT requires ~100+ tokens for reliable results. Token rank histograms need ~50+ tokens.
Computational Cost: Model inference is expensive. Cache results when possible.
Perturbation Quality: Simple word replacement perturbations used here. Production systems should use mask-filling with dedicated models.
Language: Currently English-only. Multilingual models needed for other languages.
Post-Editing: Heavily edited AI text may evade detection as curvature flattens.
from scripts.text_metrics import full_analysis
text = """
Your sample text here. Should be at least 100 tokens
for reliable DetectGPT results.
"""
results = full_analysis(text, model_name="gpt2", num_perturbations=20)
print("Token Rank Histogram:")
print(f" Top-10: {results['token_ranks']['top10_pct']:.1f}%")
print(f" Top-100: {results['token_ranks']['top100_pct']:.1f}%")
print("\nDetectGPT Curvature:")
print(f" Curvature: {results['curvature']['curvature']:.2f}")
print("\nCohesion Metrics:")
print(f" Type-Token Ratio: {results['cohesion']['lexical_diversity']['type_token_ratio']:.2f}")
print(f" Connective Rate: {results['cohesion']['connectives']['total_rate']:.1f}/1000")
# In ai-check skill, invoke text-metrics for probability dimension
token_metrics = invoke_text_metrics("token_rank_hist", test_text)
curvature_metrics = invoke_text_metrics("detectgpt_score", test_text)
cohesion_metrics = invoke_text_metrics("cohesion_bundle", test_text)
# Score probability dimension
probability_score = compute_probability_dimension(
token_metrics,
curvature_metrics,
cohesion_metrics
)
scripts/text_metrics.pyscripts/requirements.txtexamples/usage_example.mdcd skills/utility/text-metrics/scripts
pip install -r requirements.txt
# First run will download model (~500MB for gpt2)
python text_metrics.py
Solution: Ensure internet connection. Model cached in ~/.cache/huggingface/.
Solution: Use smaller model (gpt2 instead of gpt2-large) or reduce num_perturbations.
Solution: Install PyTorch with CUDA for GPU acceleration, or use CPU-optimized build.
Solution: Different models produce different ranks. Ensure same model as reference.
Gehrmann, S., Strobelt, H., & Rush, A. M. (2019). "GLTR: Statistical Detection and Visualization of Generated Text." Proceedings of ACL 2019.
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D., & Finn, C. (2023). "DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature." Proceedings of ICML 2023.
McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). "Automated Evaluation of Text and Discourse with Coh-Metrix." Cambridge University Press.
Version: 1.0.0 License: MIT Maintained by: Claude Skills Library
tools
# YouTube Transcriber Pipeline - Claude Code Skill **Version**: 1.0.0 | **Status**: Ready for Claude Code Integration ## Overview Complete 4-skill pipeline for extracting, formatting, organizing, and archiving YouTube transcripts. Integrates with Claude Code CLI using the system's configured API keys (no manual key management needed). ## Capabilities This skill provides a complete workflow: 1. **Extract Facts** - AI-powered fact extraction from transcripts (uses Claude API via Claude Code)
content-media
# YouTube Transcriber Skill Extract transcripts from YouTube videos, playlists, and channels with automatic intelligent processing. ## Overview One unified command that intelligently assesses input and handles everything—single videos, batch files, playlist expansion, channel extraction. No need to choose between commands; it figures out what to do. ## Capabilities - **Auto-Detect Input**: Single URL, file of URLs, playlist, channel - **Smart Expansion**: Automatically expands playlists/cha
development
# Trading Analysis Skill **Version**: 1.0.0 **Category**: Financial Analysis / Trading **Author**: Claude Code **Last Updated**: November 22, 2025 ## Overview Comprehensive trading performance analysis and edge identification system for Interactive Brokers accounts. Analyzes CSV statements to identify trading patterns, position sizing issues, time-of-day edges, and risk management problems. ## Features ### 1. **CSV Statement Parsing** - Parse Interactive Brokers activity statements (CSV for
development
# System Health Check & Cleanup Skill **Version**: 1.0.0 **Category**: System Administration / Performance Optimization **Author**: Claude Code **Last Updated**: November 22, 2025 ## Overview Automated system health monitoring and cleanup workflow. Diagnoses performance issues, identifies resource bottlenecks, fixes orphaned services, kills stale processes, and cleans cache bloat. Designed for archimedes (32c/125GB) but works on any Linux system. ## Features ### 1. **System Diagnostics** -