Name: kql
Author: microsoft

KQL Mastery

Try it yourself: All ✅ examples in this skill can be run against the public help cluster: https://help.kusto.windows.net, database Samples (contains StormEvents, SimpleGraph_Nodes/Edges, nyc_taxi, and more).

1. KQL Basics

Kusto Query Language (KQL) is a pipe-forward query language for exploring data. It is the native query language for Azure Data Explorer (ADX), Microsoft Fabric Real-Time Intelligence (EventHouse), Azure Monitor Log Analytics, Microsoft Sentinel, and other Microsoft data services.

Pipe-forward syntax

KQL queries are a chain of operators separated by |. Data flows left to right:

StormEvents                          // start with a table
| where State == "TEXAS"             // filter rows
| summarize count() by EventType     // aggregate
| top 5 by count_ desc              // limit results

Query vs management commands

KQL has two execution planes:

| Plane | Starts with | Examples | |-------|-------------|----------| | Query | Table name, let, print, datatable | StormEvents \| where State == "TEXAS" | | Management | .show, .create, .set, .drop, .alter | .show tables, .show table T schema |

Management commands can be followed by query operators (the output is tabular), but the entire request runs on the management plane. You cannot start with a query and pipe into a management command.

// ✅ WORKS — management command piped to query operators
.show tables | project TableName | where TableName has "Events"

// ❌ WRONG — query piped into management command
StormEvents | take 5 | .show tables

When in doubt: if the first token starts with ., it's a management command. For a full catalog of schema exploration commands, see references/discovery-queries.md.

2. Dynamic Type Discipline

KQL's dynamic type is flexible but strict in certain contexts. A common mistake is using a dynamic column in summarize by, order by, or join on without casting.

The rule: Any time you use a dynamic-typed column in by, on, or order by, wrap it in an explicit cast.

// ❌ ERROR: "Summarize group key ... is of a 'dynamic' type"
StormEvents | summarize count() by StormSummary.Details.Location

// ✅ FIX
StormEvents | summarize count() by tostring(StormSummary.Details.Location)

// ❌ ERROR: "order operator: key can't be of dynamic type"
StormEvents | order by StormSummary.TotalDamages desc

// ✅ FIX
StormEvents | order by tolong(StormSummary.TotalDamages) desc

// ❌ ERROR in join: dynamic join key
StormEvents | join kind=inner (PopulationData) on $left.StormSummary == $right.State

// ✅ FIX — cast both sides
StormEvents
| extend State_str = tostring(StormSummary.Details.Location)
| join kind=inner (PopulationData) on $left.State_str == $right.State

Self-correction: When you see "is of a 'dynamic' type" in an error, add tostring(), tolong(), or todouble().

3. Join Patterns & Pitfalls

KQL joins have constraints that differ from SQL.

Equality only

KQL join conditions support only ==. No <, >, !=, or function calls in join predicates.

// ❌ ERROR: "Only equality is allowed in this context"
StormEvents | join (nyc_taxi) on geo_distance_2points(BeginLon, BeginLat, pickup_longitude, pickup_latitude) < 1000

// ✅ WORKAROUND — pre-bucket into spatial cells, then join on cell ID
StormEvents
| extend cell = geo_point_to_s2cell(BeginLon, BeginLat, 8)
| join kind=inner (nyc_taxi | extend cell = geo_point_to_s2cell(pickup_longitude, pickup_latitude, 8)) on cell

For range joins, pre-bin values: | extend bin_val = bin(Value, 100), then join on bin_val. Note: values near bin boundaries may land in adjacent bins — consider checking neighboring bins or overlapping the range for precision.

Left/right attribute matching

Both sides of a join on clause must reference column entities only — not expressions, not aggregates.

// ❌ ERROR: "for each left attribute, right attribute should be selected"
StormEvents | join kind=inner (PopulationData) on $left.State

// ✅ FIX — specify both sides explicitly
StormEvents | join kind=inner (PopulationData) on $left.State == $right.State

Cardinality check before large joins

Always check cardinality before joining tables with >10K rows. A cross-join explosion was the source of the single E_RUNAWAY_QUERY error (25K × 195 = potential 4.8M rows).

// Before joining, check how many rows each side contributes
StormEvents | summarize dcount(State)        // → 67 distinct states
PopulationData | summarize dcount(State)     // → 52 — safe to join

4. Regex in KQL

KQL handles regex natively — no need for Python.

The `extract_all` gotcha

Unlike Python's re.findall(), KQL's extract_all requires capturing groups in the regex:

// ❌ ERROR: "extractall(): argument 2 must be a valid regex with [1..16] matching groups"
StormEvents | extend words = extract_all(@"[a-zA-Z]{3,}", EventNarrative)

// ✅ FIX — add parentheses around the pattern
StormEvents | extend words = extract_all(@"([a-zA-Z]{3,})", EventNarrative)

Regex toolkit — don't fall back to Python

| Function | Use case | Example | |----------|----------|---------| | extract(regex, group, source) | Single match | extract(@"User '([^']+)'", 1, Msg) | | extract_all(regex, source) | All matches (needs ()) | extract_all(@"(\w+)", Text) | | parse | Structured extraction | parse Msg with * "User '" Sender "' sent" * | | matches regex | Boolean filter | where Url matches regex @"^https?://" | | replace_regex | Find and replace | replace_regex(Text, @"\s+", " ") |

5. Serialization Requirements

Window functions need serialized (ordered) input.

// ❌ ERROR: "Function 'row_cumsum' cannot be invoked. The row set must be serialized."
StormEvents
| where State == "TEXAS"
| summarize DailyCount = count() by bin(StartTime, 1d)
| extend CumulativeCount = row_cumsum(DailyCount)

// ✅ FIX — add | serialize (or | order by, which implicitly serializes)
StormEvents
| where State == "TEXAS"
| summarize DailyCount = count() by bin(StartTime, 1d)
| order by StartTime asc
| extend CumulativeCount = row_cumsum(DailyCount)

Functions requiring serialization: row_number(), row_cumsum(), prev(), next(), row_window_session().

6. Memory-Safe Query Patterns

The most common memory error. Caused by scanning too much data without pre-filtering.

The progression of safety

Safest ──────────────────────────────────────────────── Most dangerous
| count    | take 10    | where + summarize    | summarize (no filter)    | full scan

Rules for large tables (>1M rows)

Always start with | count to understand table size
Always | where before | summarize — filter time range, partition key, or category first
Never dcount() on high-cardinality columns without pre-filtering
Check join cardinality before executing (see Section 3)
Use materialize() for subqueries referenced multiple times

// ❌ OUT OF MEMORY — large table, no filter, many group-by columns
StormEvents
| summarize dcount(EventType), count() by StartTime, State, Source
| where dcount_EventType > 1

// ✅ SAFE — filter first, then aggregate
StormEvents
| where StartTime between (datetime(2007-04-15) .. datetime(2007-04-16))
| summarize dcount(EventType) by State, Source
| where dcount_EventType > 1

When you see `E_LOW_MEMORY_CONDITION`

The query touched too much data. Your options:

Add | where filters (time range, partition key)
Reduce the number of by columns in summarize
Break into smaller time windows and union results
Use | sample 10000 for exploratory work instead of full scans

When you see `E_RUNAWAY_QUERY`

A join or aggregation produced too many output rows. Check join cardinality — one or both sides is too large.

7. Result Size Discipline

Large results slow down analysis. Prevention:

| Query type | Safeguard | |-----------|-----------| | Exploratory | Always end with \| take 10 or \| take 20 | | Aggregation | Use \| top 20 by ... not unbounded summarize | | Wide rows (vectors, JSON) | \| project only needed columns | | make_list() / make_set() | Avoid on high-cardinality groups (produces huge cells) | | Unknown size | Run \| count first |

The vector trap: Tables with embedding columns (1536-dim float arrays) produce ~30KB per row. Even | take 20 yields 600KB. Always | project away vector columns unless you specifically need them.

8. String Comparison Strictness

KQL sometimes requires explicit casts when comparing computed string values — even when both sides are already strings.

// ❌ ERROR: "Cannot compare values of types string and string. Try adding explicit casts"
StormEvents | where geo_point_to_s2cell(BeginLon, BeginLat, 16) == other_cell

// ✅ FIX — wrap both sides in tostring()
StormEvents | where tostring(geo_point_to_s2cell(BeginLon, BeginLat, 16)) == tostring(other_cell)

This is most common with computed values from geo_point_to_s2cell() and strcat() comparisons. When in doubt, cast with tostring().

9. Advanced Functions

KQL handles these natively — no need for Python:

Vector similarity

// try it! — cosine similarity on Iris feature vectors
let target = pack_array(5.1, 3.5, 1.4, 0.2);
Iris
| extend Vec = pack_array(SepalLength, SepalWidth, PetalLength, PetalWidth)
| extend sim = series_cosine_similarity(Vec, target)
| top 5 by sim desc

Geo operations

// Distance between two points (meters)
StormEvents | extend dist = geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat)

// Spatial bucketing for joins
StormEvents | extend cell = geo_point_to_s2cell(BeginLon, BeginLat, 8)

Graph queries

// Persistent graph model — try it on the help cluster!
graph("Simple")
| graph-match (src)-[e*1..3]->(dst)
  where src.name == "Alice"
  project src.name, dst.name, path_length = array_length(e)

// Transient graph — build inline with make-graph
SimpleGraph_Edges
| make-graph source --> target with SimpleGraph_Nodes on id
| graph-match (src)-[e*1..5]->(dst)
  where src.name == "Alice"
  project src.name, dst.name, path_length = array_length(e)

Time series

// try it! — create a time series and detect anomalies
StormEvents
| make-series count() default=0 on StartTime step 1d
| extend anomalies = series_decompose_anomalies(count_)

For detailed examples and patterns, consult references/advanced-patterns.md.

10. Self-Correction Lookup Table

When you encounter an error, look it up here before retrying:

| Error message contains | Likely cause | Fix | |---|---|---| | is of a 'dynamic' type | Dynamic column in by/on/order by | Wrap in tostring()/tolong() | | Only equality is allowed | Range predicate in join condition | Pre-bucket with S2/H3 cells or bin() | | extractall(): matching groups | Missing () in regex | Add (): @"(\w+)" not @"\w+" | | row set must be serialized | Window function on unsorted data | Add \| serialize or \| order by before it | | Cannot compare values of types string and string | Computed string comparison | Add tostring() on both sides | | Failed to resolve column named 'X' | Wrong column name or wrong table | Run .show table T schema to check column names | | E_LOW_MEMORY_CONDITION | Query touched too much data | Add \| where filters, reduce time range, break into steps | | E_RUNAWAY_QUERY | Join/aggregation produced too many rows | Check cardinality before joining; add pre-filters | | for each left attribute, right attribute | Join on clause incomplete | Use explicit form: on $left.X == $right.Y | | needs to be bracketed | Reserved word used as identifier | Use ['keyword'] syntax | | plugin doesn't exist | Unavailable plugin on this cluster | Fall back to equivalent function or Python | | Expected string literal in datetime() | Bare integer in datetime literal | Use datetime(2024-01-01) not datetime(2024) | | Unexpected token after by | Complex expression in summarize by-clause | extend the expression first, then summarize by the column | | not recognized / unknown operator | Operator not available on this engine | Check operator support; try equivalent (order by = sort by) |

11. Datetime Pitfalls

Datetime literals are a common source of errors. A wrong literal format can cascade into completely different approaches instead of fixing the small issue.

Literal format

// ❌ WRONG — bare year is not a valid datetime
StormEvents | where StartTime > datetime(2007)

// ✅ RIGHT — always use full date format
StormEvents | where StartTime > datetime(2007-01-01)

Filtering by year, month, or hour

// ❌ WRONG — comparing datetime column to integer
StormEvents | where StartTime == 2007

// ✅ RIGHT — use datetime_part() to extract components
StormEvents | where datetime_part("year", StartTime) == 2007

// ✅ ALSO RIGHT — use between with datetime range
StormEvents | where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31T23:59:59))

Time bucketing in summarize

// This works, but can be harder to read and reuse in complex queries
StormEvents | summarize count() by startofmonth(StartTime)

// Clearer — extend first, then summarize by the computed column
StormEvents
| extend Month = startofmonth(StartTime)
| summarize count() by Month
| order by Month asc

Useful datetime functions

| Function | Purpose | Example | |----------|---------|---------| | bin(ts, 1h) | Round down to bucket boundary | bin(Timestamp, 1d) | | startofmonth(ts) | First day of month | startofmonth(Timestamp) | | datetime_part("hour", ts) | Extract component | datetime_part("year", Timestamp) | | format_datetime(ts, fmt) | Format as string | format_datetime(Timestamp, "yyyy-MM") | | ago(1d) | Relative time | where Timestamp > ago(1d) | | between(a .. b) | Range filter (inclusive) | where Timestamp between (datetime(2024-01-01) .. datetime(2024-01-31T23:59:59)) | | todatetime(str) | Parse string → datetime | todatetime("2024-01-15T10:30:00Z") | | totimespan(str) | Parse string → timespan | totimespan("01:30:00") |

12. Operator Naming & Equality

KQL has subtle differences from SQL syntax.

Naming conventions

| Entity | Convention | Example | |--------|-----------|---------| | Tables | UpperCamelCase | StormEvents, NetworkLogs | | Columns | UpperCamelCase | StartTime, EventType | | Variables (let) | snake_case | let filtered_events = ... | | Built-in functions | snake_case | format_bytes(), geo_distance_2points() | | Stored functions | UpperCamelCase | .create function GetTopUsers |

Equality operators

// In where clauses, == is case-sensitive, =~ is case-insensitive
StormEvents | where State == "TEXAS" | count        // exact match
StormEvents | where State =~ "texas" | count        // case-insensitive

// In joins, use == only
StormEvents | join kind=inner (PopulationData) on State

sort vs order

Both sort by and order by work identically in KQL — they are aliases. Use whichever you prefer, but be consistent.

contains vs has

// contains: substring match (slower)
StormEvents | where EventNarrative contains "tree"   // finds "trees", "treetop" too

// has: term/word match (faster, uses index)
StormEvents | where EventNarrative has "tree"        // matches word boundaries only

// For exact prefix/suffix
StormEvents | where EventType startswith "Thunder"
StormEvents | where Source endswith "Spotter"

13. Error Recovery Strategy

When a first KQL query fails, the temptation is to abandon the entire approach and try something completely different. The correct response is almost always to fix the specific error, not change strategy.

The pattern to avoid

Query 1: extract(@"pattern", 1, col)  → Parse error
Query 2: todynamic(col)               → Different error  
Query 3: parse_json(col)              → Another error
Query 4: Python script                → Works but 10x tokens

The correct pattern

Query 1: extract(@"pattern", 1, col)  → Parse error (bad escaping)
Query 2: extract(@"pattern", 1, col)  → Fix the specific escaping issue → Success

Rules for error recovery:

Read the error message carefully — it almost always tells you exactly what's wrong
Fix the specific syntax/escaping issue, don't switch approaches
Use the self-correction table (Section 10) to map errors to fixes
Only switch approaches after 2 failed fixes of the same query
The parse operator is often simpler than extract() for structured text:

// Instead of complex regex on TraceLogs:
// extract(@"file path: \"\"([^\"]+)\"\"", 1, Message)

// Use parse for structured extraction (try it on help cluster, SampleLogs db):
cluster("help").database("SampleLogs").TraceLogs
| where Message has "file path"
| parse Message with * "file path: \"\"" FilePath "\"\"" *
| project Timestamp, FilePath
| take 5

14. Query Writing Checklist

Before running any KQL query, mentally check:

Pre-filtered? Large tables have a | where before any | summarize
Result bounded? Exploratory queries end with | take N or | top N
Dynamic columns cast? Any dynamic column in by/on/order by is wrapped
Regex has groups? extract_all patterns have () around what you want to capture
Join cardinality safe? Both sides checked with dcount() before joining
Needed columns only? Wide tables get | project to drop unneeded columns
Datetime literals valid? Using datetime(2024-01-01) not datetime(2024) or bare integers
Complex by-expressions? Use | extend first, then | summarize by the computed column
Error recovery plan? If a query fails, fix the specific error — don't change strategy

KQL Mastery

Try it yourself: All ✅ examples in this skill can be run against the public help cluster: https://help.kusto.windows.net, database Samples (contains StormEvents, SimpleGraph_Nodes/Edges, nyc_taxi, and more).

1. KQL Basics

Pipe-forward syntax

KQL queries are a chain of operators separated by |. Data flows left to right:

StormEvents                          // start with a table
| where State == "TEXAS"             // filter rows
| summarize count() by EventType     // aggregate
| top 5 by count_ desc              // limit results

Query vs management commands

KQL has two execution planes:

Management commands can be followed by query operators (the output is tabular), but the entire request runs on the management plane. You cannot start with a query and pipe into a management command.

// ✅ WORKS — management command piped to query operators
.show tables | project TableName | where TableName has "Events"

// ❌ WRONG — query piped into management command
StormEvents | take 5 | .show tables

When in doubt: if the first token starts with ., it's a management command. For a full catalog of schema exploration commands, see references/discovery-queries.md.

2. Dynamic Type Discipline

KQL's dynamic type is flexible but strict in certain contexts. A common mistake is using a dynamic column in summarize by, order by, or join on without casting.

The rule: Any time you use a dynamic-typed column in by, on, or order by, wrap it in an explicit cast.

// ❌ ERROR: "Summarize group key ... is of a 'dynamic' type"
StormEvents | summarize count() by StormSummary.Details.Location

// ✅ FIX
StormEvents | summarize count() by tostring(StormSummary.Details.Location)

// ❌ ERROR: "order operator: key can't be of dynamic type"
StormEvents | order by StormSummary.TotalDamages desc

// ✅ FIX
StormEvents | order by tolong(StormSummary.TotalDamages) desc

// ❌ ERROR in join: dynamic join key
StormEvents | join kind=inner (PopulationData) on $left.StormSummary == $right.State

// ✅ FIX — cast both sides
StormEvents
| extend State_str = tostring(StormSummary.Details.Location)
| join kind=inner (PopulationData) on $left.State_str == $right.State

Self-correction: When you see "is of a 'dynamic' type" in an error, add tostring(), tolong(), or todouble().

3. Join Patterns & Pitfalls

KQL joins have constraints that differ from SQL.

Equality only

KQL join conditions support only ==. No <, >, !=, or function calls in join predicates.

// ❌ ERROR: "Only equality is allowed in this context"
StormEvents | join (nyc_taxi) on geo_distance_2points(BeginLon, BeginLat, pickup_longitude, pickup_latitude) < 1000

// ✅ WORKAROUND — pre-bucket into spatial cells, then join on cell ID
StormEvents
| extend cell = geo_point_to_s2cell(BeginLon, BeginLat, 8)
| join kind=inner (nyc_taxi | extend cell = geo_point_to_s2cell(pickup_longitude, pickup_latitude, 8)) on cell

Left/right attribute matching

Both sides of a join on clause must reference column entities only — not expressions, not aggregates.

// ❌ ERROR: "for each left attribute, right attribute should be selected"
StormEvents | join kind=inner (PopulationData) on $left.State

// ✅ FIX — specify both sides explicitly
StormEvents | join kind=inner (PopulationData) on $left.State == $right.State

Cardinality check before large joins

Always check cardinality before joining tables with >10K rows. A cross-join explosion was the source of the single E_RUNAWAY_QUERY error (25K × 195 = potential 4.8M rows).

// Before joining, check how many rows each side contributes
StormEvents | summarize dcount(State)        // → 67 distinct states
PopulationData | summarize dcount(State)     // → 52 — safe to join

4. Regex in KQL

KQL handles regex natively — no need for Python.

The `extract_all` gotcha

Unlike Python's re.findall(), KQL's extract_all requires capturing groups in the regex:

// ❌ ERROR: "extractall(): argument 2 must be a valid regex with [1..16] matching groups"
StormEvents | extend words = extract_all(@"[a-zA-Z]{3,}", EventNarrative)

// ✅ FIX — add parentheses around the pattern
StormEvents | extend words = extract_all(@"([a-zA-Z]{3,})", EventNarrative)

Regex toolkit — don't fall back to Python

5. Serialization Requirements

Window functions need serialized (ordered) input.

// ❌ ERROR: "Function 'row_cumsum' cannot be invoked. The row set must be serialized."
StormEvents
| where State == "TEXAS"
| summarize DailyCount = count() by bin(StartTime, 1d)
| extend CumulativeCount = row_cumsum(DailyCount)

// ✅ FIX — add | serialize (or | order by, which implicitly serializes)
StormEvents
| where State == "TEXAS"
| summarize DailyCount = count() by bin(StartTime, 1d)
| order by StartTime asc
| extend CumulativeCount = row_cumsum(DailyCount)

Functions requiring serialization: row_number(), row_cumsum(), prev(), next(), row_window_session().

6. Memory-Safe Query Patterns

The most common memory error. Caused by scanning too much data without pre-filtering.

The progression of safety

Safest ──────────────────────────────────────────────── Most dangerous
| count    | take 10    | where + summarize    | summarize (no filter)    | full scan

Rules for large tables (>1M rows)

Always start with | count to understand table size
Always | where before | summarize — filter time range, partition key, or category first
Never dcount() on high-cardinality columns without pre-filtering
Check join cardinality before executing (see Section 3)
Use materialize() for subqueries referenced multiple times

// ❌ OUT OF MEMORY — large table, no filter, many group-by columns
StormEvents
| summarize dcount(EventType), count() by StartTime, State, Source
| where dcount_EventType > 1

// ✅ SAFE — filter first, then aggregate
StormEvents
| where StartTime between (datetime(2007-04-15) .. datetime(2007-04-16))
| summarize dcount(EventType) by State, Source
| where dcount_EventType > 1

When you see `E_LOW_MEMORY_CONDITION`

The query touched too much data. Your options:

Add | where filters (time range, partition key)
Reduce the number of by columns in summarize
Break into smaller time windows and union results
Use | sample 10000 for exploratory work instead of full scans

When you see `E_RUNAWAY_QUERY`

A join or aggregation produced too many output rows. Check join cardinality — one or both sides is too large.

7. Result Size Discipline

Large results slow down analysis. Prevention:

8. String Comparison Strictness

KQL sometimes requires explicit casts when comparing computed string values — even when both sides are already strings.

// ❌ ERROR: "Cannot compare values of types string and string. Try adding explicit casts"
StormEvents | where geo_point_to_s2cell(BeginLon, BeginLat, 16) == other_cell

// ✅ FIX — wrap both sides in tostring()
StormEvents | where tostring(geo_point_to_s2cell(BeginLon, BeginLat, 16)) == tostring(other_cell)

This is most common with computed values from geo_point_to_s2cell() and strcat() comparisons. When in doubt, cast with tostring().

9. Advanced Functions

KQL handles these natively — no need for Python:

Vector similarity

// try it! — cosine similarity on Iris feature vectors
let target = pack_array(5.1, 3.5, 1.4, 0.2);
Iris
| extend Vec = pack_array(SepalLength, SepalWidth, PetalLength, PetalWidth)
| extend sim = series_cosine_similarity(Vec, target)
| top 5 by sim desc

Geo operations

// Distance between two points (meters)
StormEvents | extend dist = geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat)

// Spatial bucketing for joins
StormEvents | extend cell = geo_point_to_s2cell(BeginLon, BeginLat, 8)

Graph queries

// Persistent graph model — try it on the help cluster!
graph("Simple")
| graph-match (src)-[e*1..3]->(dst)
  where src.name == "Alice"
  project src.name, dst.name, path_length = array_length(e)

// Transient graph — build inline with make-graph
SimpleGraph_Edges
| make-graph source --> target with SimpleGraph_Nodes on id
| graph-match (src)-[e*1..5]->(dst)
  where src.name == "Alice"
  project src.name, dst.name, path_length = array_length(e)

Time series

// try it! — create a time series and detect anomalies
StormEvents
| make-series count() default=0 on StartTime step 1d
| extend anomalies = series_decompose_anomalies(count_)

For detailed examples and patterns, consult references/advanced-patterns.md.

10. Self-Correction Lookup Table

When you encounter an error, look it up here before retrying:

11. Datetime Pitfalls

Datetime literals are a common source of errors. A wrong literal format can cascade into completely different approaches instead of fixing the small issue.

Literal format

// ❌ WRONG — bare year is not a valid datetime
StormEvents | where StartTime > datetime(2007)

// ✅ RIGHT — always use full date format
StormEvents | where StartTime > datetime(2007-01-01)

Filtering by year, month, or hour

// ❌ WRONG — comparing datetime column to integer
StormEvents | where StartTime == 2007

// ✅ RIGHT — use datetime_part() to extract components
StormEvents | where datetime_part("year", StartTime) == 2007

// ✅ ALSO RIGHT — use between with datetime range
StormEvents | where StartTime between (datetime(2007-01-01) .. datetime(2007-12-31T23:59:59))

Time bucketing in summarize

// This works, but can be harder to read and reuse in complex queries
StormEvents | summarize count() by startofmonth(StartTime)

// Clearer — extend first, then summarize by the computed column
StormEvents
| extend Month = startofmonth(StartTime)
| summarize count() by Month
| order by Month asc

Useful datetime functions

12. Operator Naming & Equality

KQL has subtle differences from SQL syntax.

Naming conventions

Equality operators

// In where clauses, == is case-sensitive, =~ is case-insensitive
StormEvents | where State == "TEXAS" | count        // exact match
StormEvents | where State =~ "texas" | count        // case-insensitive

// In joins, use == only
StormEvents | join kind=inner (PopulationData) on State

sort vs order

Both sort by and order by work identically in KQL — they are aliases. Use whichever you prefer, but be consistent.

contains vs has

// contains: substring match (slower)
StormEvents | where EventNarrative contains "tree"   // finds "trees", "treetop" too

// has: term/word match (faster, uses index)
StormEvents | where EventNarrative has "tree"        // matches word boundaries only

// For exact prefix/suffix
StormEvents | where EventType startswith "Thunder"
StormEvents | where Source endswith "Spotter"

13. Error Recovery Strategy

The pattern to avoid

Query 1: extract(@"pattern", 1, col)  → Parse error
Query 2: todynamic(col)               → Different error  
Query 3: parse_json(col)              → Another error
Query 4: Python script                → Works but 10x tokens

The correct pattern

Query 1: extract(@"pattern", 1, col)  → Parse error (bad escaping)
Query 2: extract(@"pattern", 1, col)  → Fix the specific escaping issue → Success

Rules for error recovery:

Read the error message carefully — it almost always tells you exactly what's wrong
Fix the specific syntax/escaping issue, don't switch approaches
Use the self-correction table (Section 10) to map errors to fixes
Only switch approaches after 2 failed fixes of the same query
The parse operator is often simpler than extract() for structured text:

// Instead of complex regex on TraceLogs:
// extract(@"file path: \"\"([^\"]+)\"\"", 1, Message)

// Use parse for structured extraction (try it on help cluster, SampleLogs db):
cluster("help").database("SampleLogs").TraceLogs
| where Message has "file path"
| parse Message with * "file path: \"\"" FilePath "\"\"" *
| project Timestamp, FilePath
| take 5

14. Query Writing Checklist

Before running any KQL query, mentally check:

Pre-filtered? Large tables have a | where before any | summarize
Result bounded? Exploratory queries end with | take N or | top N
Dynamic columns cast? Any dynamic column in by/on/order by is wrapped
Regex has groups? extract_all patterns have () around what you want to capture
Join cardinality safe? Both sides checked with dcount() before joining
Needed columns only? Wide tables get | project to drop unneeded columns
Datetime literals valid? Using datetime(2024-01-01) not datetime(2024) or bare integers
Complex by-expressions? Use | extend first, then | summarize by the computed column
Error recovery plan? If a query fails, fix the specific error — don't change strategy

Adoption

microsoft/kql

$ install --global

Security Scan Results

SKILL.md

KQL Mastery

1. KQL Basics

Pipe-forward syntax

Query vs management commands

2. Dynamic Type Discipline

3. Join Patterns & Pitfalls

Equality only

Left/right attribute matching

Cardinality check before large joins

4. Regex in KQL

The extract_all gotcha

Regex toolkit — don't fall back to Python

5. Serialization Requirements

6. Memory-Safe Query Patterns

The progression of safety

Rules for large tables (>1M rows)

When you see E_LOW_MEMORY_CONDITION

When you see E_RUNAWAY_QUERY

7. Result Size Discipline

8. String Comparison Strictness

9. Advanced Functions

Vector similarity

Geo operations

Graph queries

Time series

10. Self-Correction Lookup Table

11. Datetime Pitfalls

Literal format

Filtering by year, month, or hour

Time bucketing in summarize

Useful datetime functions

12. Operator Naming & Equality

Naming conventions

Equality operators

sort vs order

contains vs has

13. Error Recovery Strategy

The pattern to avoid

The correct pattern

14. Query Writing Checklist

Related Skills

microsoft/microsoft-foundry

microsoft/azure-validate

microsoft/azure-quotas

microsoft/azure-deploy

microsoft/kql

$ install --global

Security Scan Results

SKILL.md

KQL Mastery

1. KQL Basics

Pipe-forward syntax

Query vs management commands

2. Dynamic Type Discipline

3. Join Patterns & Pitfalls

Equality only

Left/right attribute matching

Cardinality check before large joins

4. Regex in KQL

The extract_all gotcha

Regex toolkit — don't fall back to Python

5. Serialization Requirements

6. Memory-Safe Query Patterns

The progression of safety

Rules for large tables (>1M rows)

When you see E_LOW_MEMORY_CONDITION

When you see E_RUNAWAY_QUERY

7. Result Size Discipline

8. String Comparison Strictness

9. Advanced Functions

Vector similarity

Geo operations

Graph queries

Time series

10. Self-Correction Lookup Table

The `extract_all` gotcha

When you see `E_LOW_MEMORY_CONDITION`

When you see `E_RUNAWAY_QUERY`

The `extract_all` gotcha

When you see `E_LOW_MEMORY_CONDITION`

When you see `E_RUNAWAY_QUERY`