Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

skills-il/israeli-chatbot-analytics

Name: israeli-chatbot-analytics
Author: skills-il

israeli-chatbot-analytics/SKILL.md

npx skillsauth add skills-il/developer-tools israeli-chatbot-analytics

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Israeli Chatbot Analytics

Analyze and optimize Hebrew chatbot performance. This skill covers conversation flow analytics, Hebrew-specific sentiment analysis, drop-off detection, user satisfaction scoring, A/B testing for Hebrew response variants, intent recognition accuracy tracking, anomaly alerting, and reporting dashboards. Use it to understand whether your Hebrew chatbot is actually helping users and where to focus improvements.

Instructions

Step 1: Collect and Structure Conversation Logs

Before analyzing, ensure conversation data is structured consistently. Each conversation session should include:

# Standard conversation log schema
conversation_log = {
    "session_id": "uuid-string",
    "user_id": "anonymous-or-identified",
    "channel": "whatsapp|telegram|web|app",
    "language": "he",           # Primary language detected
    "started_at": "ISO-8601",
    "ended_at": "ISO-8601",
    "messages": [
        {
            "timestamp": "ISO-8601",
            "sender": "user|bot",
            "text": "שלום, אני צריך עזרה",
            "intent": "greeting",           # Detected intent
            "intent_confidence": 0.92,       # Model confidence
            "entities": [],                  # Extracted entities
            "response_time_ms": 340,         # Bot response latency
        }
    ],
    "outcome": "resolved|escalated|abandoned|unknown",
    "satisfaction_score": null,   # CSAT score if collected
    "metadata": {
        "bot_version": "2.1.0",
        "ab_variant": "formal_he",
    }
}

If your platform does not export in this format, write a transformer to normalize logs before analysis. Common platforms and their export formats:

| Platform | Export Method | Format | |----------|-------------|--------| | Dialogflow CX | BigQuery export | JSON rows with session context. Use the he-il language code on new agents; iw is deprecated and frozen for new features (https://docs.cloud.google.com/dialogflow/cx/docs/reference/language). | | Rasa Pro / CALM | Analytics dashboard + tracker events | Flow-step events (Rasa Pro 3.x with CALM is dialogue-driven, not intent-driven, so legacy intent-accuracy metrics map differently). | | Rasa Open Source (legacy) | Tracker Store (SQL/Mongo) | Events list per conversation. Rasa OSS entered maintenance mode in 2025, see https://legacy-docs-oss.rasa.com/docs/rasa/. | | Botpress | Conversation export / DB | JSON. Hebrew is listed as a supported language but full RTL alignment in the default web webchat is still a community-reported gap as of 2026, verify message bubble alignment in your widget before reporting on dialect distribution. | | Custom bots | Application logs | Varies (normalize to schema above) | | WhatsApp Cloud API | Webhook logs | Message objects with metadata. See ## WhatsApp Business Platform pricing notes below for the per-message cost model that started July 2025. | | ManyChat | Audience + flow exports | CSV/JSON. WhatsApp send-out costs flow through Meta's per-message tariff. |

Step 2: Conversation Flow Analysis

Analyze session-level metrics to understand overall chatbot health:

Build a ConversationMetrics dataclass that tracks total_sessions, completed_sessions, escalated_sessions, abandoned_sessions, session_lengths (per-session message count), and session_durations (seconds). Derive rate properties (completion_rate, escalation_rate, abandonment_rate) as count / total_sessions, and avg_session_length / median_session_duration_seconds from the list fields.

compute_flow_metrics(conversations) iterates the structured logs once, increments the right outcome counter (resolved / escalated / abandoned), appends message count and (ended_at - started_at).total_seconds(), and returns the metrics object.

Key benchmarks for Hebrew chatbots (Israeli market, 2025-2026):

| Metric | Good | Average | Needs Improvement | |--------|------|---------|-------------------| | Completion rate | > 70% | 50-70% | < 50% | | Escalation rate | < 15% | 15-30% | > 30% | | Abandonment rate | < 20% | 20-35% | > 35% | | Avg session length | 4-8 messages | 8-15 messages | > 15 messages | | First-contact resolution | > 65% | 45-65% | < 45% |

Step 3: Drop-off Point Detection

Identify where users abandon conversations. This reveals UX problems, confusing prompts, or missing capabilities:

detect_drop_off_points(conversations) filters to outcome == "abandoned" and returns three Counter.most_common slices: drop-off by conversation depth (message count), by active intent at drop (walking from the tail to the first message with an intent), and by last bot message (first 80 chars, walking from the tail for the last sender == "bot").

detect_conversation_loops(conversations, threshold=3) flags sessions where the bot repeats the same text ≥ threshold times in a row by scanning the bot-message stream and tracking a consecutive-repeat counter; emit {session_id, repeated_message, repeat_count, total_messages} for each looped session.

Step 4: Hebrew Sentiment Analysis

Hebrew sentiment analysis requires special handling due to morphological complexity, negation patterns, and slang. Use DictaBERT (encoder, classification) or DictaLM 2.0-Instruct (generative, 7B parameters, Mistral-based) for production accuracy, AlephBERT (onlplab/alephbert-base from BIU's OnlpLab) as an alternative encoder baseline, or a lexicon-based approach for lightweight analysis. DictaLM 2.0 (released July 2024) is the current state-of-the-art Hebrew LLM from Dicta and ships an instruct variant trained on roughly 200B Hebrew+English tokens with a 2.76 tokens-per-word compression rate, useful when you need a single model to classify sentiment AND summarize the conversation in Hebrew prose for the ops team.

Using DictaBERT (recommended for production):

Build HebrewSentimentAnalyzer around the dicta-il/dictabert-sentiment model (3-class: negative/neutral/positive).

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("dicta-il/dictabert-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("dicta-il/dictabert-sentiment").eval()

Wrap tok(text, return_tensors="pt", truncation=True, max_length=512, padding=True) + torch.softmax(model(**inputs).logits, dim=-1), then map each probability row to {label, score, scores} (label = argmax over ["negative","neutral","positive"]). Add an analyze_batch(texts, batch_size=32) that loops over slices.

Hebrew-specific sentiment challenges (summary):

Negation: "לא" before an adjective flips meaning. "לא רע" (not bad) reads mildly positive in Israeli usage.
Sarcasm and irony: very common in Israeli communication ("יופי, בדיוק מה שחיכיתי לו" can be deeply negative). DictaBERT handles some of it; fine-tune on domain data for better coverage.
Slang: evolves fast. "אחלה" / "סבבה" / "בומבה" are positive, "חרא" / "פאדיחה" are negative, "וואלה" is context-dependent.
Mixed Hebrew-English: users mix English words into Hebrew ("ה-support שלכם גרוע"). Ensure your model or lexicon handles both scripts in one message.

See references/hebrew-sentiment-guide.md for the full treatment of these challenges, including the slang lexicon and negation-handling code.

Step 5: Intent Recognition Accuracy Tracking

Track how well your chatbot understands user requests over time:

Build IntentAccuracyTracker to log (predicted, actual, confidence, timestamp) per prediction and expose:

confusion_matrix(): 2D {actual: {predicted: count}} over the sorted intent universe.
misclassification_report(min_count=5): top (actual, predicted) pairs where predicted != actual.
low_confidence_intents(threshold=0.6): intents whose mean confidence is below threshold, with sample_count and below_threshold_pct.
accuracy_trend(): daily {date, accuracy, sample_count} series for plotting (bucket by timestamp[:10]).

How to get ground truth labels:

Manual labeling: Sample 100-200 conversations per week and have Hebrew-speaking annotators label actual intents. This is the gold standard.
Escalation signals: When a user explicitly corrects the bot ("לא, התכוונתי ל...") or asks for a human agent after a misunderstanding, flag the prior intent as incorrect.
Post-chat surveys: Ask "Did the bot understand what you needed?" and correlate with detected intent.

Step 6: User Satisfaction Measurement

Combine multiple signals to build a satisfaction score:

@dataclass
class SatisfactionSignals:
    """Combine multiple satisfaction signals into a composite score."""

    # Direct feedback (if available)
    csat_score: float | None = None      # 1-5 scale
    thumbs_rating: str | None = None     # "up" or "down"

    # Behavioral signals
    session_resolved: bool = False
    escalated_to_human: bool = False
    abandoned: bool = False
    repeated_fallbacks: int = 0
    loop_detected: bool = False

    # Sentiment signals
    final_sentiment: str = "neutral"     # positive/neutral/negative
    sentiment_trend: str = "stable"      # improving/stable/declining

    def composite_score(self) -> float:
        """Composite satisfaction (0.0-1.0). If `csat_score` is present, return
        `(csat_score - 1) / 4` directly. Otherwise start at 0.5 (or 0.8/0.2 for
        thumbs up/down), then add: +0.15 resolved, -0.1 escalated, -0.2 abandoned,
        -0.15 repeated_fallbacks>2, -0.2 loop_detected, +/-0.1-0.15 final_sentiment,
        +/-0.05-0.1 sentiment_trend; clamp to [0, 1]."""
        ...

Provide collect_post_chat_survey_he() that returns a Hebrew post-chat survey: title "נשמח לשמוע מה חשבת", a 1-5 rating on "עד כמה הצ'אטבוט עזר לך?", a yes/no on "האם הצ'אטבוט הבין את מה שרצית?", and an optional open "רוצה לשתף עוד משהו?" field. Use "שלח משוב" as the submit label.

Step 7: A/B Testing for Hebrew Response Variants

Test different phrasings, formality levels, and gender handling strategies:

Build HebrewABTestManager with three responsibilities:

Register a test. create_test(test_id, variants: {name: response_text}, traffic_split=None). Default split is uniform across variants. Store {variants, traffic_split, created_at} per test_id. Example variants:

{"formal": "שלום וברוכים הבאים. כיצד נוכל לסייע לכם?",
 "casual": "היי! איך אפשר לעזור?",
 "gender_neutral": "שלום! ניתן לבחור מהאפשרויות הבאות:"}

Deterministic bucketing. assign_variant(test_id, user_id) hashes f"{user_id}:{test_id}" with hashlib.md5, maps to a bucket in [0, 1), and walks the cumulative traffic_split so the same user always gets the same variant. Use this in get_response(...) and increment an impressions counter at the same time.
Outcome tracking. record_outcome(test_id, variant, completed=False, satisfaction=None, escalated=False) and get_test_results(test_id) returning per-variant {impressions, completion_rate, avg_satisfaction, escalation_rate}.

Common Hebrew A/B test dimensions:

| Dimension | Variant A | Variant B | What to Measure | |-----------|-----------|-----------|-----------------| | Formality | "כיצד נוכל לסייע?" | "איך אפשר לעזור?" | Completion rate | | Gender | Slash notation ("את/ה") | Gender-neutral ("ניתן ל...") | Satisfaction score | | Length | Detailed explanation | Short, punchy response | Drop-off rate | | Emoji usage | With emoji | Without emoji | Engagement | | Error phrasing | "לא הצלחתי להבין" | "אפשר לנסח אחרת?" | Retry rate |

Step 8: Performance Dashboards and KPIs

Track these key metrics in your dashboard:

@dataclass
class ChatbotDashboard:
    """Key metrics for chatbot performance dashboard."""

    # Core metrics
    total_conversations: int = 0
    resolution_rate: float = 0.0        # % resolved without escalation
    first_contact_resolution: float = 0.0  # % resolved in first session
    avg_handle_time_seconds: float = 0.0
    escalation_rate: float = 0.0
    abandonment_rate: float = 0.0

    # User satisfaction
    avg_csat: float = 0.0               # 1-5 scale
    nps_score: float = 0.0              # -100 to 100
    thumbs_up_ratio: float = 0.0        # % positive

    # Intent accuracy
    intent_accuracy: float = 0.0        # % correctly classified
    fallback_rate: float = 0.0          # % of messages hitting fallback

    # Performance
    avg_response_time_ms: float = 0.0
    p95_response_time_ms: float = 0.0

    # Volume
    conversations_per_day: float = 0.0
    peak_hour: int = 0                  # 0-23
    busiest_day: str = ""               # "Sunday" etc.

    def to_report_dict(self) -> dict:
        """Group fields into core / satisfaction / accuracy / performance / volume
        sections for reporting (format rates as %, times as ms)."""
        ...

Implement build_dashboard(conversations, period_days=7) to populate the dataclass:

Outcome rates from Counter(c["outcome"]) / n.
avg_handle_time_seconds from (ended_at - started_at).total_seconds() per session.
avg_csat from satisfaction_score where present.
avg_response_time_ms / p95_response_time_ms from bot messages with response_time_ms (p95 via sorted_rts[int(len * 0.95)]).
intent_accuracy = share of user messages with intent_confidence > 0.7. fallback_rate = share of user messages with intent == "fallback".
conversations_per_day = n / period_days. peak_hour and busiest_day from Counter over started_at hour and weekday.

Israeli traffic patterns to expect:

Peak hours are typically 10:00-12:00 and 19:00-22:00 (Israel Time, UTC+2/+3)
Sunday is the busiest day (first workday of the Israeli week)
Friday afternoon and Saturday see minimal traffic
Holiday periods (Rosh Hashana, Pesach, Sukkot) show different patterns

Retention and Returning-User Metrics

Session-level metrics tell you how a single conversation went, but not whether the bot earns repeat use. Track these retention dimensions alongside the dashboard above (all require a stable user_id across sessions, pseudonymized per the Privacy and Consent section):

For each user_id, collect the set of distinct dates with a conversation. Then:

D1 return rate = share whose first-date + 1 day is also in their set.
D7 return rate = share whose first-date + 2..7 days intersects their set.
Repeat-contact rate = share with > 1 distinct date.
D1 / D7 return rate: share of users who start a new conversation the day after, or within a week of, their first contact. D7 is more stable than D1 for low-volume Israeli bots.
Repeat-contact rate: share of users with more than one conversation. On a support bot this can be good (trust) or bad (unresolved issues), so read it with first-contact resolution.

Step 9: Hebrew-Specific Analytics Challenges

RTL Text in Charts and Visualizations

When rendering analytics dashboards that display Hebrew text, handle these RTL issues:

import matplotlib.pyplot as plt
import matplotlib

# Use a font that supports Hebrew
matplotlib.rcParams["font.family"] = ["DejaVu Sans", "Arial", "Heebo"]

# Tip: Use horizontal bar charts so Hebrew labels read naturally on the y-axis.
# For interactive dashboards, Plotly handles RTL better than matplotlib.
# Use font-family "Heebo, Arial, sans-serif" and add extra left margin for labels.

Hebrew Word Tokenization for Word Clouds

Standard whitespace tokenization does not work well for Hebrew due to prefix particles (ב, ה, ו, ל, מ, כ, ש):

# Standard whitespace tokenization fails for Hebrew due to prefix particles.
# Use YAP (https://github.com/OnlpLab/yap) for production, or strip common prefixes:
HEBREW_PREFIXES = ["ב", "ה", "ו", "ל", "מ", "כ", "ש", "וה", "של", "לה"]

# Strip prefixes only if word is long enough (>3 chars) and remainder >= 2 chars.
# For word clouds: use bidi algorithm to convert Hebrew for display,
# remove stopwords (של, את, על, עם, אני, זה, כי, גם, לא, יש, אין, מה).
# See references/hebrew-sentiment-guide.md for detailed tokenization code.

Mixed Hebrew-English Query Handling

Israeli users frequently mix languages. Track language distribution and handle accordingly:

import re

def detect_message_language(text: str) -> str:
    """Detect primary language by counting Hebrew vs English characters."""
    hebrew_chars = len(re.findall(r'[\u0590-\u05FF]', text))
    english_chars = len(re.findall(r'[a-zA-Z]', text))
    total = hebrew_chars + english_chars
    if total == 0:
        return "unknown"
    return "he" if hebrew_chars / total >= 0.5 else "en"

# Track mixed-language rate: messages where 20-80% is Hebrew.
# Israeli users frequently code-switch between Hebrew and English.

Step 10: Alerting and Anomaly Detection

Set up alerts to catch problems before they affect too many users:

from dataclasses import dataclass
from datetime import datetime, timedelta

@dataclass
class AlertRule:
    """Define an alerting rule for chatbot metrics."""
    name: str
    metric: str
    operator: str          # "gt" (greater than), "lt" (less than)
    threshold: float
    window_minutes: int    # Rolling window
    severity: str          # "critical", "warning", "info"
    description_he: str    # Hebrew description for ops team


# Recommended alert rules for Hebrew chatbots
# AlertRule(name, metric, operator, threshold, window_minutes, severity, description_he)
DEFAULT_ALERT_RULES = [
    AlertRule("high_escalation_rate", "escalation_rate", "gt", 0.35, 60, "warning",
              "שיעור הסלמה גבוה מ-35% בשעה האחרונה"),
    AlertRule("satisfaction_drop", "avg_csat", "lt", 3.0, 120, "critical",
              "שביעות רצון ממוצעת ירדה מתחת ל-3.0 בשעתיים האחרונות"),
    AlertRule("high_abandonment", "abandonment_rate", "gt", 0.40, 60, "critical",
              "שיעור נטישה גבוה מ-40% בשעה האחרונה"),
    AlertRule("high_fallback_rate", "fallback_rate", "gt", 0.25, 30, "warning",
              "שיעור fallback גבוה מ-25% בחצי שעה האחרונה"),
    AlertRule("slow_response", "p95_response_time_ms", "gt", 3000, 15, "warning",
              "זמן תגובה P95 חורג מ-3 שניות ברבע השעה האחרון"),
    AlertRule("new_unrecognized_intents", "new_unknown_intents_count", "gt", 20, 60,
              "info", "יותר מ-20 כוונות לא מזוהות חדשות בשעה האחרונה"),
]

AlertManager wraps the rule list. check_metrics(current_metrics: dict) walks every rule, skips when the metric is missing, and triggers when value > threshold (op gt) or value < threshold (op lt). Each triggered alert is a dict with rule_name, severity, metric, current_value, threshold, description_he, and triggered_at.

Step 11: Reporting Templates

Generate periodic reports summarizing chatbot performance:

Implement generate_weekly_report(dashboard, previous_dashboard=None, period_start, period_end):

Helper trend_arrow(current, previous, higher_is_better): returns (ללא שינוי) for < 1% delta; otherwise emits [v] +X.X% (good direction) or [!] +X.X% (bad direction).
Emit a # דוח ביצועי צ'אטבוט שבועי header, period subheader, and a | מדד | ערך | שינוי מהשבוע הקודם | markdown table over: שיחות, שיעור פתרון, CSAT, שיעור הסלמה (lower-is-better), שיעור נטישה (lower-is-better), דיוק זיהוי כוונות, זמן תגובה ממוצע (lower-is-better).
Append a ## תנועה block with conversations_per_day, peak_hour, busiest_day.

Step 12: Integration with Chatbot Platforms

Dialogflow CX Analytics

Implement parse_dialogflow_cx_logs(bigquery_rows) to fold a Dialogflow CX BigQuery export into the standard conversations shape.

Export query: SELECT * FROM project.dataset.dialogflow_cx_interactions WHERE DATE(request_time) BETWEEN @start AND @end.
Group rows by session_id. For each session, track min/max request_time as started_at / ended_at.
For each row, append a user message (text = query_text, intent = matched_intent, intent_confidence) and/or bot message (text = response_text). Sort each session's messages by timestamp. Set language = "he", outcome = "unknown" (derive from flow completion downstream).

Rasa Tracker Store Analytics

Note: Rasa Open Source is in maintenance mode. The intent-based tracker-store analytics below apply to existing Rasa OSS deployments; new Rasa builds use CALM (Conversational AI with Language Models), which is dialogue-driven rather than intent-driven, so intent-accuracy metrics map differently there. See the legacy OSS docs at https://legacy-docs-oss.rasa.com/docs/rasa/ for tracker-store details.

Implement parse_rasa_tracker_events(tracker_events) to fold a Rasa tracker-store stream into the standard conversations shape.

Query: SELECT * FROM events WHERE sender_id = @sender_id ORDER BY timestamp.
Iterate events. On session_started, flush the in-progress session and start a new one. On user, append a user message with intent.name and intent.confidence from parse_data. On bot, append a bot message with text. On action with name == "action_human_handoff", set outcome = "escalated". Flush the trailing session at the end.

WhatsApp Business Platform pricing notes

Many Israeli chatbots run on WhatsApp Cloud API, where send-out cost is a first-class analytics dimension. Pricing changed on July 1, 2025 from a per-conversation model to per-message billing across 4 categories:

| Category | Pricing posture | When to use | |----------|-----------------|-------------| | Marketing | Highest per-message rate, no volume discount | Promotions, broadcasts, re-engagement | | Utility | Lower than marketing (typically under $0.03), eligible for volume discounts | Order updates, appointment reminders, account notices triggered by user action | | Authentication | Lowest non-free tier, eligible for volume discounts | OTP codes for login / payment / 2FA | | Service | Free | Any reply from the business within the 24-hour customer service window (user-initiated session) |

Two free windows worth tracking explicitly in your analytics:

24-hour service window. When a user sends an inbound message, you can reply with free-form text (no template, no charge) for the next 24 hours. Optimizing analytics for "did we resolve in the service window?" can eliminate a whole template-cost line item for reactive support flows. See https://developers.facebook.com/documentation/business-messaging/whatsapp/pricing.
72-hour click-to-WhatsApp / Facebook ad window. When the user arrives from a click-to-WhatsApp ad or a Facebook Page CTA, all messages (including templates) are free for 72 hours.

Add template_category (marketing/utility/authentication/service) and arrived_via_ctw_ad boolean to your conversation log schema so finance and product can split CSAT/resolution by paid vs. free interaction. Israeli rates are not published per-country in the public docs, pull your specific Israel rate from the Meta Business Manager pricing tool or your BSP (e.g. Twilio, 360dialog, Vonage) when sizing campaigns.

Anti-spam compliance (Israel Communications Law, Section 30A)

If your chatbot sends marketing messages (broadcasts, promotional templates on WhatsApp, Telegram campaigns, SMS retargeting), Section 30A of the Communications Law (Telecom and Broadcasts) 5742-1982 applies. The law requires prior written opt-in consent before sending advertising messages via SMS, email, fax, robocalls, and, under the 2008 amendment language as interpreted by Israeli courts, electronic communication that includes WhatsApp, Telegram, and similar IM apps. The term "advertisement" is interpreted broadly: any message not purely service-related can be treated as advertising.

Practical analytics tracking:

Tag every send as opt_in_basis: "explicit_form" / "ctw_ad_click" / "service_reply" / "transactional". This is your audit trail if a complaint reaches the Ministry of Communications.
Track unsubscribe path success rate. Marketing messages must include the word "advertisement" (פרסומת), the sender's name and address, and a working opt-out path. Measure the time-to-unsubscribe and the success rate of the opt-out flow as a compliance KPI.
Service vs. marketing split. Run completion-rate and CSAT separately for opt-in marketing flows vs. user-initiated service flows, they behave very differently and combining them masks both.
Cross-reference: gws-hebrew-email-automation and israeli-telegram-business-bot cover the same opt-in regime for email and Telegram. Use those skills if you also operate those channels.

This is engineering guidance, not legal advice. The maximum statutory damages per unsolicited marketing message are NIS 1,000 without proof of damages, so a misconfigured broadcast to even a few hundred non-consenting users can become a meaningful financial event. Confirm specifics with a privacy lawyer.

Experimentation platforms for Hebrew chatbots

When you outgrow HebrewABTestManager (in-process bucketing, in-memory results) and need real statistical analysis with sequential testing and CUPED variance reduction, the mainstream feature-flag + experimentation platforms all work fine for Hebrew chatbots, none of them care what language your variant_text is in. Pick by team and infra fit:

| Platform | Best fit | Notes for Hebrew chatbot teams | |----------|----------|--------------------------------| | Statsig | Teams wanting flags + experiments + product analytics in one stack | OpenAI acquired Statsig in 2025 for $1.1B; generous free tier still good for small Israeli bots. | | LaunchDarkly | Mature enterprise teams needing approvals, audit logs, RBAC | The "safe" enterprise choice; pair with your existing analytics for stats. | | GrowthBook | Teams with a data warehouse (BigQuery, Snowflake, Postgres) who want stats run against their own data | Open source; does NOT collect event data, so Hebrew transcripts never leave your warehouse, useful for Amendment 13 data-residency posture. |

For Hebrew-specific gotchas, plan on longer test durations (2+ weeks, 200+ impressions per variant), Israeli user bases are smaller and weekly seasonality (Sun-Thu work week) makes 1-week tests unreliable.

Modern analytics stack notes (GA4 + Mixpanel, 2026)

GA4 "AI Assistant" channel. GA4 now ships a built-in Channel Group: AI Assistant (Medium ai-assistant) that auto-categorizes traffic from ChatGPT, Gemini, and Claude (Perplexity reportedly included; Google has not formally confirmed). If you embed your bot on a marketing site, this is the easiest way to attribute incoming traffic referred by an LLM to the bot's funnel, no custom regex needed (https://martech.org/ga4-now-tracks-ai-chatbot-traffic-automatically/).
Mixpanel Spark + MCP Server. Mixpanel released Spark (AI query builder) and an MCP server in 2025-2026 that lets Claude / ChatGPT / Cursor query Mixpanel data conversationally. For Hebrew dashboards specifically this matters because you can ask follow-up questions in Hebrew and Spark routes them to the right event/property, useful when the ops team is not fluent in funnel-query UI.

Examples

Example 1: Analyze chatbot performance for the past week

User says: "Analyze my Hebrew chatbot logs from the past week and show me where users are dropping off."

Actions:

Load conversation logs from the specified time period.
Run compute_flow_metrics() to get session-level stats.
Run detect_drop_off_points() to find abandonment patterns.
Run detect_conversation_loops() to identify stuck users.
Generate a summary with actionable recommendations.

Result: Report with completion rate, top drop-off points, looping conversations, and abandonment patterns.

Example 2: Set up A/B testing for greeting messages

User says: "I want to test whether a formal or casual Hebrew greeting works better."

Actions:

Create an A/B test with HebrewABTestManager.create_test().
Define variants: formal ("כיצד נוכל לסייע לכם היום?") vs. casual ("היי! מה אפשר לעשות בשבילך?").
Configure traffic split (50/50).
Integrate with the bot's greeting handler.
Set up outcome tracking (completion rate, CSAT, escalation).

Result: Running A/B test with deterministic user assignment and statistical outcome tracking.

Example 3: Set up anomaly alerting

User says: "Alert me if chatbot satisfaction drops suddenly."

Actions:

Configure AlertManager with satisfaction and escalation rules.
Set up rolling window calculations for recent metrics.
Connect alerts to notification channels (Slack, email, PagerDuty).
Add Hebrew-language alert descriptions for the ops team.

Result: Real-time monitoring that triggers alerts when CSAT drops below 3.0, escalation rate exceeds 35%, or abandonment spikes above 40%.

Example 4: Generate a weekly performance report

User says: "Create a Hebrew weekly report for the chatbot team."

Actions:

Run build_dashboard() for the current and previous weeks.
Call generate_weekly_report() with both dashboards for trend arrows.
Include drop-off analysis and intent accuracy breakdown.
Format output in Hebrew with RTL-compatible tables.

Result: A formatted Hebrew report with week-over-week comparisons, trend indicators, and key metrics ready to share with the team.

Bundled Resources

Scripts

scripts/conversation-analyzer.py -- Analyze chatbot conversation logs for key metrics (drop-off, sentiment, resolution). Run: python scripts/conversation-analyzer.py --help

References

references/chatbot-metrics-glossary.md -- Glossary of chatbot analytics metrics with Hebrew translations and industry benchmarks. Consult when defining KPIs or explaining metrics to Hebrew-speaking stakeholders.
references/hebrew-sentiment-guide.md -- Guide to Hebrew sentiment analysis challenges including negation, sarcasm, slang, and mixed-language handling. Consult when building or tuning Hebrew sentiment models.

Gotchas

Hebrew sentiment analysis requires Israeli-specific training data. Standard English sentiment models misclassify Hebrew sarcasm (very common in Israeli communication) as neutral or positive.
Israeli chatbot usage peaks on Sunday mornings (start of work week), not Monday. Weekly analytics reports should anchor to Sunday-Thursday.
Hebrew text analytics must handle prefixed particles (ב-, ל-, כ-, מ-) that change word boundaries. Standard tokenizers trained on English split Hebrew words incorrectly.
Israeli users frequently code-switch between Hebrew and English within a single chatbot conversation. Analytics tools must handle bilingual sessions, not treat them as two separate languages.

Privacy and Consent

This skill ingests full conversation transcripts and user_id values, and runs sentiment analysis on user messages. Conversation text is personal data and often contains sensitive content (health, finances, complaints). Handle it under Israel's Privacy Protection Law, including Amendment 13 (in force August 2025), which tightened consent, notice, accountability, and data-minimization obligations.

Practical rules:

Consent and notice. Get consent to store and analyze chat content, and tell users in your privacy notice that conversations are retained and analyzed for quality. Sentiment analysis on user messages is a processing purpose that should be disclosed.
Pseudonymize user_id. Do not analyze raw phone numbers, emails, or Teudat Zehut as the identifier. Hash or tokenize user_id before it reaches the analytics pipeline, and keep the mapping table separate and access-controlled. Retention and A/B-test bucketing still work on a stable pseudonymous ID.
Minimize and redact. Strip or mask entities you do not need for analytics (ID numbers, full names, card numbers) before storing transcripts. You rarely need the raw PII to measure drop-off or sentiment.
Retention limits. Set an explicit retention window for raw transcripts (for example 90 days) and keep only aggregated metrics long-term. Document the window and delete on schedule.
Access control and location. Restrict who can read raw conversations, log access, and confirm where the data is stored and processed.
This is engineering guidance, not legal advice. Confirm your specific obligations with a privacy professional.

Recommended MCP Servers

No MCP server is required for this skill. It operates entirely on exported conversation logs (BigQuery exports, Rasa tracker-store dumps, application log files) that you load from disk and analyze locally with the bundled Python script. There is no live API to wrap, so no MCP integration is needed.

Reference Links

| Source | URL | What to Check | |--------|-----|---------------| | Dialogflow CX language reference | https://docs.cloud.google.com/dialogflow/cx/docs/reference/language | Hebrew language code he-il (use this on new agents; iw is deprecated) | | Dialogflow CX analytics | https://cloud.google.com/dialogflow/cx/docs/concept/analytics | Built-in conversation analytics, intent metrics | | Rasa CALM docs | https://rasa.com/docs/learn/concepts/calm/ | Dialogue-driven flows for Rasa Pro 3.x, replaces intent-based design for new builds | | Rasa OSS documentation (legacy) | https://legacy-docs-oss.rasa.com/docs/rasa/ | Event tracking, tracker stores, custom analytics integrations (maintenance mode) | | WhatsApp Business Platform pricing | https://developers.facebook.com/documentation/business-messaging/whatsapp/pricing | Per-message rates by country + category (marketing/utility/auth/service), free 24h window rules | | DictaBERT (Hebrew BERT suite) | https://huggingface.co/dicta-il/dictabert | Pre-trained Hebrew BERT for classification fine-tunes | | DictaBERT sentiment | https://huggingface.co/dicta-il/dictabert-sentiment | Off-the-shelf Hebrew sentiment classifier (3-class) | | DictaLM 2.0 Instruct | https://huggingface.co/dicta-il/dictalm2.0-instruct | Generative Hebrew LLM (7B, Mistral-based) for summaries + classification in one call | | AlephBERT | https://huggingface.co/onlplab/alephbert-base | Alternative Hebrew BERT from BIU OnlpLab | | HuggingFace Hebrew models | https://huggingface.co/models?language=he | Browse the full Hebrew model catalog | | Mixpanel help | https://mixpanel.com/help | Funnel analysis, cohort retention for chat flows | | Matomo analytics | https://matomo.org/docs/ | Self-hosted event tracking, privacy-friendly | | Israel Privacy Amendment 13 (IAPP) | https://iapp.org/news/a/israel-marks-a-new-era-in-privacy-law-amendment-13-ushers-in-sweeping-reform | Effective Aug 14, 2025: consent, notice, retention limits, deletion mechanisms | | Section 30A anti-spam guide (DLA Piper) | https://www.dlapiperdataprotection.com/index.html?t=electronic-marketing&c=IL | Opt-in regime for SMS / email / IM marketing in Israel |

Troubleshooting

DictaBERT model not loading: the dicta-il/dictabert-sentiment model needs PyTorch + transformers (~500MB). Run pip install torch transformers; for CPU-only, install torch from https://download.pytorch.org/whl/cpu.
Hebrew text appears reversed in charts: matplotlib has no native RTL. Apply python-bidi (bidi.algorithm.get_display()) before rendering, or switch to Plotly.
Tokenization produces wrong word frequencies: whitespace splitting ignores Hebrew prefix particles. Use the prefix-stripping tokenizer in Step 9, or the YAP morphological analyzer (https://github.com/OnlpLab/yap) for production.
Sentiment scores unreliable for short messages: messages of 1-3 words lack context ("סבבה" can be positive or neutral). For under 4 words, rely on behavioral signals (continued / escalated / abandoned) instead, combined with satisfaction signals from Step 6.
A/B test results not statistically significant: usually insufficient sample size, common for smaller Israeli user bases. Run at least 2 weeks, aim for 200+ impressions per variant, target p < 0.05.

skills-il/israeli-chatbot-analytics

israeli-chatbot-analytics/SKILL.md

Analyze and optimize Hebrew chatbot performance with conversation flow analytics, Hebrew sentiment analysis, drop-off detection, user satisfaction scoring, A/B testing for response variants, and reporting dashboards. Use when user asks to "analyze chatbot performance", "measure chatbot satisfaction", "track Hebrew bot metrics", "analitika shel tsatbot" (Hebrew transliteration), or needs help with conversation analytics, intent accuracy tracking, or chatbot reporting. Supports Dialogflow, Rasa, and custom bot platforms. Do NOT use for building chatbots (use hebrew-chatbot-builder), Hebrew NLP model training (use hebrew-nlp-toolkit), customer support workflow setup (use israeli-customer-support-automator), or voice bot development (use hebrew-voice-bot-builder).

9 stars

tools

Updated May 21, 2026

$ install --global

skillsauth

npx skillsauth add skills-il/developer-tools israeli-chatbot-analytics

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 21, 2026, 4:33 AM135.3s7 files scanned

SKILL.md

name:: israeli-chatbot-analytics
description:: Analyze and optimize Hebrew chatbot performance with conversation flow analytics, Hebrew sentiment analysis, drop-off detection, user satisfaction scoring, A/B testing for response variants, and reporting dashboards. Use when user asks to "analyze chatbot performance", "measure chatbot satisfaction", "track Hebrew bot metrics", "analitika shel tsatbot" (Hebrew transliteration), or needs help with conversation analytics, intent accuracy tracking, or chatbot reporting. Supports Dialogflow, Rasa, and custom bot platforms. Do NOT use for building chatbots (use hebrew-chatbot-builder), Hebrew NLP model training (use hebrew-nlp-toolkit), customer support workflow setup (use israeli-customer-support-automator), or voice bot development (use hebrew-voice-bot-builder).
license:: MIT
allowed-tools:: Bash(python:*), Bash(pip:*)
compatibility:: Requires Python 3.10+. Works with Claude Code, Cursor, Windsurf.

Israeli Chatbot Analytics

Instructions

Step 1: Collect and Structure Conversation Logs

Before analyzing, ensure conversation data is structured consistently. Each conversation session should include:

# Standard conversation log schema
conversation_log = {
    "session_id": "uuid-string",
    "user_id": "anonymous-or-identified",
    "channel": "whatsapp|telegram|web|app",
    "language": "he",           # Primary language detected
    "started_at": "ISO-8601",
    "ended_at": "ISO-8601",
    "messages": [
        {
            "timestamp": "ISO-8601",
            "sender": "user|bot",
            "text": "שלום, אני צריך עזרה",
            "intent": "greeting",           # Detected intent
            "intent_confidence": 0.92,       # Model confidence
            "entities": [],                  # Extracted entities
            "response_time_ms": 340,         # Bot response latency
        }
    ],
    "outcome": "resolved|escalated|abandoned|unknown",
    "satisfaction_score": null,   # CSAT score if collected
    "metadata": {
        "bot_version": "2.1.0",
        "ab_variant": "formal_he",
    }
}

If your platform does not export in this format, write a transformer to normalize logs before analysis. Common platforms and their export formats:

Step 2: Conversation Flow Analysis

Analyze session-level metrics to understand overall chatbot health:

Key benchmarks for Hebrew chatbots (Israeli market, 2025-2026):

Step 3: Drop-off Point Detection

Identify where users abandon conversations. This reveals UX problems, confusing prompts, or missing capabilities:

Step 4: Hebrew Sentiment Analysis

Using DictaBERT (recommended for production):

Build HebrewSentimentAnalyzer around the dicta-il/dictabert-sentiment model (3-class: negative/neutral/positive).

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("dicta-il/dictabert-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("dicta-il/dictabert-sentiment").eval()

Hebrew-specific sentiment challenges (summary):

Negation: "לא" before an adjective flips meaning. "לא רע" (not bad) reads mildly positive in Israeli usage.
Sarcasm and irony: very common in Israeli communication ("יופי, בדיוק מה שחיכיתי לו" can be deeply negative). DictaBERT handles some of it; fine-tune on domain data for better coverage.
Slang: evolves fast. "אחלה" / "סבבה" / "בומבה" are positive, "חרא" / "פאדיחה" are negative, "וואלה" is context-dependent.
Mixed Hebrew-English: users mix English words into Hebrew ("ה-support שלכם גרוע"). Ensure your model or lexicon handles both scripts in one message.

See references/hebrew-sentiment-guide.md for the full treatment of these challenges, including the slang lexicon and negation-handling code.

Step 5: Intent Recognition Accuracy Tracking

Track how well your chatbot understands user requests over time:

Build IntentAccuracyTracker to log (predicted, actual, confidence, timestamp) per prediction and expose:

confusion_matrix(): 2D {actual: {predicted: count}} over the sorted intent universe.
misclassification_report(min_count=5): top (actual, predicted) pairs where predicted != actual.
low_confidence_intents(threshold=0.6): intents whose mean confidence is below threshold, with sample_count and below_threshold_pct.
accuracy_trend(): daily {date, accuracy, sample_count} series for plotting (bucket by timestamp[:10]).

How to get ground truth labels:

Manual labeling: Sample 100-200 conversations per week and have Hebrew-speaking annotators label actual intents. This is the gold standard.
Escalation signals: When a user explicitly corrects the bot ("לא, התכוונתי ל...") or asks for a human agent after a misunderstanding, flag the prior intent as incorrect.
Post-chat surveys: Ask "Did the bot understand what you needed?" and correlate with detected intent.

Step 6: User Satisfaction Measurement

Combine multiple signals to build a satisfaction score:

@dataclass
class SatisfactionSignals:
    """Combine multiple satisfaction signals into a composite score."""

    # Direct feedback (if available)
    csat_score: float | None = None      # 1-5 scale
    thumbs_rating: str | None = None     # "up" or "down"

    # Behavioral signals
    session_resolved: bool = False
    escalated_to_human: bool = False
    abandoned: bool = False
    repeated_fallbacks: int = 0
    loop_detected: bool = False

    # Sentiment signals
    final_sentiment: str = "neutral"     # positive/neutral/negative
    sentiment_trend: str = "stable"      # improving/stable/declining

    def composite_score(self) -> float:
        """Composite satisfaction (0.0-1.0). If `csat_score` is present, return
        `(csat_score - 1) / 4` directly. Otherwise start at 0.5 (or 0.8/0.2 for
        thumbs up/down), then add: +0.15 resolved, -0.1 escalated, -0.2 abandoned,
        -0.15 repeated_fallbacks>2, -0.2 loop_detected, +/-0.1-0.15 final_sentiment,
        +/-0.05-0.1 sentiment_trend; clamp to [0, 1]."""
        ...

Step 7: A/B Testing for Hebrew Response Variants

Test different phrasings, formality levels, and gender handling strategies:

Build HebrewABTestManager with three responsibilities:

Register a test. create_test(test_id, variants: {name: response_text}, traffic_split=None). Default split is uniform across variants. Store {variants, traffic_split, created_at} per test_id. Example variants:

{"formal": "שלום וברוכים הבאים. כיצד נוכל לסייע לכם?",
 "casual": "היי! איך אפשר לעזור?",
 "gender_neutral": "שלום! ניתן לבחור מהאפשרויות הבאות:"}

Deterministic bucketing. assign_variant(test_id, user_id) hashes f"{user_id}:{test_id}" with hashlib.md5, maps to a bucket in [0, 1), and walks the cumulative traffic_split so the same user always gets the same variant. Use this in get_response(...) and increment an impressions counter at the same time.
Outcome tracking. record_outcome(test_id, variant, completed=False, satisfaction=None, escalated=False) and get_test_results(test_id) returning per-variant {impressions, completion_rate, avg_satisfaction, escalation_rate}.

Common Hebrew A/B test dimensions:

Step 8: Performance Dashboards and KPIs

Track these key metrics in your dashboard:

@dataclass
class ChatbotDashboard:
    """Key metrics for chatbot performance dashboard."""

    # Core metrics
    total_conversations: int = 0
    resolution_rate: float = 0.0        # % resolved without escalation
    first_contact_resolution: float = 0.0  # % resolved in first session
    avg_handle_time_seconds: float = 0.0
    escalation_rate: float = 0.0
    abandonment_rate: float = 0.0

    # User satisfaction
    avg_csat: float = 0.0               # 1-5 scale
    nps_score: float = 0.0              # -100 to 100
    thumbs_up_ratio: float = 0.0        # % positive

    # Intent accuracy
    intent_accuracy: float = 0.0        # % correctly classified
    fallback_rate: float = 0.0          # % of messages hitting fallback

    # Performance
    avg_response_time_ms: float = 0.0
    p95_response_time_ms: float = 0.0

    # Volume
    conversations_per_day: float = 0.0
    peak_hour: int = 0                  # 0-23
    busiest_day: str = ""               # "Sunday" etc.

    def to_report_dict(self) -> dict:
        """Group fields into core / satisfaction / accuracy / performance / volume
        sections for reporting (format rates as %, times as ms)."""
        ...

Implement build_dashboard(conversations, period_days=7) to populate the dataclass:

Outcome rates from Counter(c["outcome"]) / n.
avg_handle_time_seconds from (ended_at - started_at).total_seconds() per session.
avg_csat from satisfaction_score where present.
avg_response_time_ms / p95_response_time_ms from bot messages with response_time_ms (p95 via sorted_rts[int(len * 0.95)]).
intent_accuracy = share of user messages with intent_confidence > 0.7. fallback_rate = share of user messages with intent == "fallback".
conversations_per_day = n / period_days. peak_hour and busiest_day from Counter over started_at hour and weekday.

Israeli traffic patterns to expect:

Peak hours are typically 10:00-12:00 and 19:00-22:00 (Israel Time, UTC+2/+3)
Sunday is the busiest day (first workday of the Israeli week)
Friday afternoon and Saturday see minimal traffic
Holiday periods (Rosh Hashana, Pesach, Sukkot) show different patterns

Retention and Returning-User Metrics

For each user_id, collect the set of distinct dates with a conversation. Then:

D1 return rate = share whose first-date + 1 day is also in their set.
D7 return rate = share whose first-date + 2..7 days intersects their set.
Repeat-contact rate = share with > 1 distinct date.
D1 / D7 return rate: share of users who start a new conversation the day after, or within a week of, their first contact. D7 is more stable than D1 for low-volume Israeli bots.
Repeat-contact rate: share of users with more than one conversation. On a support bot this can be good (trust) or bad (unresolved issues), so read it with first-contact resolution.

Step 9: Hebrew-Specific Analytics Challenges

RTL Text in Charts and Visualizations

When rendering analytics dashboards that display Hebrew text, handle these RTL issues:

import matplotlib.pyplot as plt
import matplotlib

# Use a font that supports Hebrew
matplotlib.rcParams["font.family"] = ["DejaVu Sans", "Arial", "Heebo"]

# Tip: Use horizontal bar charts so Hebrew labels read naturally on the y-axis.
# For interactive dashboards, Plotly handles RTL better than matplotlib.
# Use font-family "Heebo, Arial, sans-serif" and add extra left margin for labels.

Hebrew Word Tokenization for Word Clouds

Standard whitespace tokenization does not work well for Hebrew due to prefix particles (ב, ה, ו, ל, מ, כ, ש):

# Standard whitespace tokenization fails for Hebrew due to prefix particles.
# Use YAP (https://github.com/OnlpLab/yap) for production, or strip common prefixes:
HEBREW_PREFIXES = ["ב", "ה", "ו", "ל", "מ", "כ", "ש", "וה", "של", "לה"]

# Strip prefixes only if word is long enough (>3 chars) and remainder >= 2 chars.
# For word clouds: use bidi algorithm to convert Hebrew for display,
# remove stopwords (של, את, על, עם, אני, זה, כי, גם, לא, יש, אין, מה).
# See references/hebrew-sentiment-guide.md for detailed tokenization code.

Mixed Hebrew-English Query Handling

Israeli users frequently mix languages. Track language distribution and handle accordingly:

import re

def detect_message_language(text: str) -> str:
    """Detect primary language by counting Hebrew vs English characters."""
    hebrew_chars = len(re.findall(r'[\u0590-\u05FF]', text))
    english_chars = len(re.findall(r'[a-zA-Z]', text))
    total = hebrew_chars + english_chars
    if total == 0:
        return "unknown"
    return "he" if hebrew_chars / total >= 0.5 else "en"

# Track mixed-language rate: messages where 20-80% is Hebrew.
# Israeli users frequently code-switch between Hebrew and English.

Step 10: Alerting and Anomaly Detection

Set up alerts to catch problems before they affect too many users:

from dataclasses import dataclass
from datetime import datetime, timedelta

@dataclass
class AlertRule:
    """Define an alerting rule for chatbot metrics."""
    name: str
    metric: str
    operator: str          # "gt" (greater than), "lt" (less than)
    threshold: float
    window_minutes: int    # Rolling window
    severity: str          # "critical", "warning", "info"
    description_he: str    # Hebrew description for ops team


# Recommended alert rules for Hebrew chatbots
# AlertRule(name, metric, operator, threshold, window_minutes, severity, description_he)
DEFAULT_ALERT_RULES = [
    AlertRule("high_escalation_rate", "escalation_rate", "gt", 0.35, 60, "warning",
              "שיעור הסלמה גבוה מ-35% בשעה האחרונה"),
    AlertRule("satisfaction_drop", "avg_csat", "lt", 3.0, 120, "critical",
              "שביעות רצון ממוצעת ירדה מתחת ל-3.0 בשעתיים האחרונות"),
    AlertRule("high_abandonment", "abandonment_rate", "gt", 0.40, 60, "critical",
              "שיעור נטישה גבוה מ-40% בשעה האחרונה"),
    AlertRule("high_fallback_rate", "fallback_rate", "gt", 0.25, 30, "warning",
              "שיעור fallback גבוה מ-25% בחצי שעה האחרונה"),
    AlertRule("slow_response", "p95_response_time_ms", "gt", 3000, 15, "warning",
              "זמן תגובה P95 חורג מ-3 שניות ברבע השעה האחרון"),
    AlertRule("new_unrecognized_intents", "new_unknown_intents_count", "gt", 20, 60,
              "info", "יותר מ-20 כוונות לא מזוהות חדשות בשעה האחרונה"),
]

Step 11: Reporting Templates

Generate periodic reports summarizing chatbot performance:

Implement generate_weekly_report(dashboard, previous_dashboard=None, period_start, period_end):

Helper trend_arrow(current, previous, higher_is_better): returns (ללא שינוי) for < 1% delta; otherwise emits [v] +X.X% (good direction) or [!] +X.X% (bad direction).
Emit a # דוח ביצועי צ'אטבוט שבועי header, period subheader, and a | מדד | ערך | שינוי מהשבוע הקודם | markdown table over: שיחות, שיעור פתרון, CSAT, שיעור הסלמה (lower-is-better), שיעור נטישה (lower-is-better), דיוק זיהוי כוונות, זמן תגובה ממוצע (lower-is-better).
Append a ## תנועה block with conversations_per_day, peak_hour, busiest_day.

Step 12: Integration with Chatbot Platforms

Dialogflow CX Analytics

Implement parse_dialogflow_cx_logs(bigquery_rows) to fold a Dialogflow CX BigQuery export into the standard conversations shape.

Export query: SELECT * FROM project.dataset.dialogflow_cx_interactions WHERE DATE(request_time) BETWEEN @start AND @end.
Group rows by session_id. For each session, track min/max request_time as started_at / ended_at.
For each row, append a user message (text = query_text, intent = matched_intent, intent_confidence) and/or bot message (text = response_text). Sort each session's messages by timestamp. Set language = "he", outcome = "unknown" (derive from flow completion downstream).

Rasa Tracker Store Analytics

Implement parse_rasa_tracker_events(tracker_events) to fold a Rasa tracker-store stream into the standard conversations shape.

Query: SELECT * FROM events WHERE sender_id = @sender_id ORDER BY timestamp.
Iterate events. On session_started, flush the in-progress session and start a new one. On user, append a user message with intent.name and intent.confidence from parse_data. On bot, append a bot message with text. On action with name == "action_human_handoff", set outcome = "escalated". Flush the trailing session at the end.

WhatsApp Business Platform pricing notes

Two free windows worth tracking explicitly in your analytics:

24-hour service window. When a user sends an inbound message, you can reply with free-form text (no template, no charge) for the next 24 hours. Optimizing analytics for "did we resolve in the service window?" can eliminate a whole template-cost line item for reactive support flows. See https://developers.facebook.com/documentation/business-messaging/whatsapp/pricing.
72-hour click-to-WhatsApp / Facebook ad window. When the user arrives from a click-to-WhatsApp ad or a Facebook Page CTA, all messages (including templates) are free for 72 hours.

Anti-spam compliance (Israel Communications Law, Section 30A)

Practical analytics tracking:

Tag every send as opt_in_basis: "explicit_form" / "ctw_ad_click" / "service_reply" / "transactional". This is your audit trail if a complaint reaches the Ministry of Communications.
Track unsubscribe path success rate. Marketing messages must include the word "advertisement" (פרסומת), the sender's name and address, and a working opt-out path. Measure the time-to-unsubscribe and the success rate of the opt-out flow as a compliance KPI.
Service vs. marketing split. Run completion-rate and CSAT separately for opt-in marketing flows vs. user-initiated service flows, they behave very differently and combining them masks both.
Cross-reference: gws-hebrew-email-automation and israeli-telegram-business-bot cover the same opt-in regime for email and Telegram. Use those skills if you also operate those channels.

Experimentation platforms for Hebrew chatbots

Modern analytics stack notes (GA4 + Mixpanel, 2026)

GA4 "AI Assistant" channel. GA4 now ships a built-in Channel Group: AI Assistant (Medium ai-assistant) that auto-categorizes traffic from ChatGPT, Gemini, and Claude (Perplexity reportedly included; Google has not formally confirmed). If you embed your bot on a marketing site, this is the easiest way to attribute incoming traffic referred by an LLM to the bot's funnel, no custom regex needed (https://martech.org/ga4-now-tracks-ai-chatbot-traffic-automatically/).
Mixpanel Spark + MCP Server. Mixpanel released Spark (AI query builder) and an MCP server in 2025-2026 that lets Claude / ChatGPT / Cursor query Mixpanel data conversationally. For Hebrew dashboards specifically this matters because you can ask follow-up questions in Hebrew and Spark routes them to the right event/property, useful when the ops team is not fluent in funnel-query UI.

Examples

Example 1: Analyze chatbot performance for the past week

User says: "Analyze my Hebrew chatbot logs from the past week and show me where users are dropping off."

Actions:

Load conversation logs from the specified time period.
Run compute_flow_metrics() to get session-level stats.
Run detect_drop_off_points() to find abandonment patterns.
Run detect_conversation_loops() to identify stuck users.
Generate a summary with actionable recommendations.

Result: Report with completion rate, top drop-off points, looping conversations, and abandonment patterns.

Example 2: Set up A/B testing for greeting messages

User says: "I want to test whether a formal or casual Hebrew greeting works better."

Actions:

Create an A/B test with HebrewABTestManager.create_test().
Define variants: formal ("כיצד נוכל לסייע לכם היום?") vs. casual ("היי! מה אפשר לעשות בשבילך?").
Configure traffic split (50/50).
Integrate with the bot's greeting handler.
Set up outcome tracking (completion rate, CSAT, escalation).

Result: Running A/B test with deterministic user assignment and statistical outcome tracking.

Example 3: Set up anomaly alerting

User says: "Alert me if chatbot satisfaction drops suddenly."

Actions:

Configure AlertManager with satisfaction and escalation rules.
Set up rolling window calculations for recent metrics.
Connect alerts to notification channels (Slack, email, PagerDuty).
Add Hebrew-language alert descriptions for the ops team.

Result: Real-time monitoring that triggers alerts when CSAT drops below 3.0, escalation rate exceeds 35%, or abandonment spikes above 40%.

Example 4: Generate a weekly performance report

User says: "Create a Hebrew weekly report for the chatbot team."

Actions:

Run build_dashboard() for the current and previous weeks.
Call generate_weekly_report() with both dashboards for trend arrows.
Include drop-off analysis and intent accuracy breakdown.
Format output in Hebrew with RTL-compatible tables.

Result: A formatted Hebrew report with week-over-week comparisons, trend indicators, and key metrics ready to share with the team.

Bundled Resources

Scripts

scripts/conversation-analyzer.py -- Analyze chatbot conversation logs for key metrics (drop-off, sentiment, resolution). Run: python scripts/conversation-analyzer.py --help

References

references/chatbot-metrics-glossary.md -- Glossary of chatbot analytics metrics with Hebrew translations and industry benchmarks. Consult when defining KPIs or explaining metrics to Hebrew-speaking stakeholders.
references/hebrew-sentiment-guide.md -- Guide to Hebrew sentiment analysis challenges including negation, sarcasm, slang, and mixed-language handling. Consult when building or tuning Hebrew sentiment models.

Gotchas

Hebrew sentiment analysis requires Israeli-specific training data. Standard English sentiment models misclassify Hebrew sarcasm (very common in Israeli communication) as neutral or positive.
Israeli chatbot usage peaks on Sunday mornings (start of work week), not Monday. Weekly analytics reports should anchor to Sunday-Thursday.
Hebrew text analytics must handle prefixed particles (ב-, ל-, כ-, מ-) that change word boundaries. Standard tokenizers trained on English split Hebrew words incorrectly.
Israeli users frequently code-switch between Hebrew and English within a single chatbot conversation. Analytics tools must handle bilingual sessions, not treat them as two separate languages.

Privacy and Consent

Practical rules:

Consent and notice. Get consent to store and analyze chat content, and tell users in your privacy notice that conversations are retained and analyzed for quality. Sentiment analysis on user messages is a processing purpose that should be disclosed.
Pseudonymize user_id. Do not analyze raw phone numbers, emails, or Teudat Zehut as the identifier. Hash or tokenize user_id before it reaches the analytics pipeline, and keep the mapping table separate and access-controlled. Retention and A/B-test bucketing still work on a stable pseudonymous ID.
Minimize and redact. Strip or mask entities you do not need for analytics (ID numbers, full names, card numbers) before storing transcripts. You rarely need the raw PII to measure drop-off or sentiment.
Retention limits. Set an explicit retention window for raw transcripts (for example 90 days) and keep only aggregated metrics long-term. Document the window and delete on schedule.
Access control and location. Restrict who can read raw conversations, log access, and confirm where the data is stored and processed.
This is engineering guidance, not legal advice. Confirm your specific obligations with a privacy professional.

Recommended MCP Servers

Reference Links

Troubleshooting

DictaBERT model not loading: the dicta-il/dictabert-sentiment model needs PyTorch + transformers (~500MB). Run pip install torch transformers; for CPU-only, install torch from https://download.pytorch.org/whl/cpu.
Hebrew text appears reversed in charts: matplotlib has no native RTL. Apply python-bidi (bidi.algorithm.get_display()) before rendering, or switch to Plotly.
Tokenization produces wrong word frequencies: whitespace splitting ignores Hebrew prefix particles. Use the prefix-stripping tokenizer in Step 9, or the YAP morphological analyzer (https://github.com/OnlpLab/yap) for production.
Sentiment scores unreliable for short messages: messages of 1-3 words lack context ("סבבה" can be positive or neutral). For under 4 words, rely on behavioral signals (continued / escalated / abandoned) instead, combined with satisfaction signals from Step 6.
A/B test results not statistically significant: usually insufficient sample size, common for smaller Israeli user bases. Run at least 2 weeks, aim for 200+ impressions per variant, target p < 0.05.

Related Skills

skills-il/video-use-best-practices

tools

VerifiedTrustedCommunity

Best practices for using browser-use/video-use to edit Hebrew videos end-to-end with Claude Code. Covers the Hebrew-specific deltas to video-use's 12 Hard Rules: SUB_FORCE_STYLE override (Helvetica has no Hebrew glyphs), the python-bidi pre-shape recipe for libass+SRT BiDi failures on macOS, Hebrew filler-word post-pass on Scribe word timestamps, fontsdir= parameter for reliable font discovery, takes_packed.md handling for Hebrew with sofit/nikud/code-switching, and animation slot guidance that defers to hyperframes-best-practices and remotion-best-practices. Use when editing Hebrew talking-head video, podcast clips, tutorials, or marketing video with video-use. Do NOT use for non-Hebrew video-use sessions (read upstream SKILL.md directly), Hebrew podcast audio-only post-production (use hebrew-podcast-postproduction), or generic FFmpeg work without video-use orchestration.

9SKILL.mdUpdated May 17, 2026

skills-il/video-use-best-practices

skills-il/open-slide-best-practices

development

VerifiedTrustedCommunity

Best practices for authoring presentations with open-slide, the React slide framework with a fixed 1920×1080 canvas, with full Hebrew and RTL support. Covers the slides/[id]/index.tsx file contract, type scale, DesignSystem tokens, themes/ system, @slide-comment inspector markers, current.json deictic resolution, Hebrew Google Fonts (Heebo, Rubik, Assistant, Noto Sans Hebrew), CSS logical properties, bidirectional Hebrew+English text with the bdi element, and Hebrew-aware type scale tuning. Use when authoring or editing slides under slides/[id]/ in an open-slide project, or when building Hebrew or bilingual decks on the framework. Do NOT use for video creation (use remotion-best-practices or hyperframes-best-practices), or for generic Hebrew presentations outside open-slide (use presentation-generator).

9SKILL.mdUpdated May 13, 2026

skills-il/open-slide-best-practices

skills-il/hyperframes-best-practices

development

VerifiedTrustedCommunity

Best practices for programmatic video creation using HyperFrames, plain HTML compositions with GSAP animations rendered to MP4, with full Hebrew and RTL support. Covers composition authoring, data-* timing attributes, GSAP timeline contract, layout-before-animation methodology, visual identity gate, Hebrew fonts via Google Fonts (Heebo, Rubik, Assistant), RTL text rendering with dir="rtl", Hebrew TikTok/Reels-style captions via Whisper, audio-reactive visuals, scene transitions, and bidirectional Hebrew+English text. Use when building HTML-based video content or Hebrew social/marketing videos without React. Do NOT use for Remotion or general React video work, use remotion-best-practices for that.

9SKILL.mdUpdated Apr 24, 2026

skills-il/hyperframes-best-practices

skills-il/zapier-israeli-integrations

tools

VerifiedTrustedCommunity

Build Zapier Zaps connecting Israeli business apps (Morning/Green Invoice, Cardcom, Tranzila, iCount, Grow) with global services for billing, payment, and workflow automation. Use when asked to "create a Zap for Israeli invoicing", "automate Morning receipts", "connect Cardcom to my CRM", or set up payment notifications. Covers Hebrew text handling, ILS formatting, bimonthly VAT logic, Invoice Reform 2026, Zapier AI (Copilot, Agents, MCP), and webhooks from Israeli processors. All amounts use decimal shekels, not agorot. Customer WhatsApp requires Twilio/WATI (not Zapier native). Do NOT use for n8n (use n8n-hebrew-workflows), Make.com (use make-com-israeli-automations), or non-Zapier automation.

9SKILL.mdUpdated Apr 20, 2026

skills-il/zapier-israeli-integrations

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/skills-il/developer-tools.git

# Copy into Claude Code skills folder (global)
cp -r developer-tools/israeli-chatbot-analytics ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

skills-il/developer-tools

9 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT