.ai-rulez/skills/conversion-mapping-rules/SKILL.md
Conversion Mapping Rules: HTML Elements to Markdown
npx skillsauth add kreuzberg-dev/html-to-markdown .ai-rulez/skills/conversion-mapping-rulesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill documents how html-to-markdown maps 60+ HTML element types to their Markdown equivalents. The conversion logic respects Markdown syntax variations (ATX vs Setext headings, fenced vs indented code, etc.) and maintains semantic accuracy.
# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6
Implementation:
HeadingStyle::Atx (default)HTML Example:
<h1>Title</h1> → # Title
<h2 id="intro">Intro</h2> → ## Intro
<h3>Detail</h3> → ### Detail
Heading 1
=========
Heading 2
---------
Implementation:
HeadingStyle::Underlined= characters for full line width- characters for full line widthHTML Example:
<h1>Main Title</h1> → Main Title\n===========
<h2>Subtitle</h2> → Subtitle\n---------
<h3>Detail</h3> → ### Detail (fallback to ATX)
# Heading 1 #
## Heading 2 ##
### Heading 3 ###
Implementation:
HeadingStyle::AtxClosed<p>)Mapping:
Example:
<p>This is a paragraph with <strong>bold</strong> text.</p>
→ This is a paragraph with **bold** text.\n
<div>)Behavior:
Example:
<div>
<p>Paragraph inside div</p>
</div>
→ Paragraph inside div\n
<blockquote>)Mapping:
>> >Example:
<blockquote>
<p>Quote line 1</p>
<p>Quote line 2</p>
</blockquote>
→ > Quote line 1\n>\n> Quote line 2\n
<pre>)Behavior:
Example:
<pre> code with spaces</pre>
→ (indented code or fenced, depends on CodeBlockStyle)
Indented Style (Default):
line 1
line 2
line 3
Implementation:
CodeBlockStyle::IndentedFenced Backtick Style:
```language
code here
**Implementation:**
- Option: `CodeBlockStyle::Backticks`
- Triple backticks with optional language specifier
- Language from HTML class (e.g., `language-rust` → `rust`)
- Can contain blank lines
**Fenced Tilde Style:**
```markdown
~~~rust
code here
~~~
Implementation:
CodeBlockStyle::TildesHTML Mapping:
<pre><code>simple code</code></pre>
<pre><code class="language-python">def foo(): pass</code></pre>
<pre>indented code</pre>
<hr>)Output: ---\n (three dashes)
Alternatives: ***, ___ all valid but standardized to ---
<ul>)Default Syntax (dashes):
- Item 1
- Item 2
- Nested item
- Deeply nested
Implementation:
- marker (could be * or +, but - is default)ListIndentType::Spaces (default) or ListIndentType::Tabs<ol>)1. First item
2. Second item
3. Third item
Implementation:
1. through 9. for first 9 items (reset per list). (dot space)<li>)Behavior:
- First paragraph
Second paragraph (indented)
HTML Example:
<ul>
<li>
<p>Item with paragraph</p>
<p>Second paragraph</p>
</li>
</ul>
<dl>, <dt>, <dd>)Term
: Definition
Another Term
: Definition 1
: Definition 2
Implementation:
<dt>: Term on its own line<dd>: Definition with : prefix and indentation<table>, <tr>, <td>, <th>)Mapping:
| Header 1 | Header 2 |
|----------|----------|
| Cell 1 | Cell 2 |
| Cell 3 | Cell 4 |
Implementation:
<table> → GFM (GitHub Flavored Markdown) table<thead> content becomes header row<tbody> rows become data rows| pipes|---|---| (minimum 3 dashes):---| Left: |:-- Center: :--:Cell Content:
| → \|)<strong> → **)<br> representation<article>)<section>)<nav>)<aside>)<header>)<footer>)---\n)<main>)<em>, <i>)Mapping: *text* or _text_
Implementation:
* (asterisk italic)Example:
<em>emphasized</em> → *emphasized*
<i>italic</i> → *italic*
<strong>, <b>)Mapping: **text**
Implementation:
Example:
<strong>bold</strong> → **bold**
<b>bold</b> → **bold**
<strong><em>bold italic</em></strong> → ***bold italic***
<code>)Mapping: `text` (backtick inline code)
Implementation:
Example:
<code>variable_name</code> → `variable_name`
<code>don't</code> → `don't`
<code>`already_quoted`</code> → `` `already_quoted` ``
<a href>)Mapping: [link text](url "title")
Implementation:
href attribute becomes URLtitle attribute becomes optional title (in quotes)href="#section" → Anchor linkhref="/page" → Internal link (relative)href="https://external.com" → External linkhref="mailto:[email protected]" → Email linkhref="tel:+1234567890" → Phone linkExamples:
<a href="https://example.com">Link</a>
→ [Link](https://example.com)
<a href="/page" title="My Page">Internal</a>
→ [Internal](/page "My Page")
<a href="#section">Anchor</a>
→ [Anchor](#section)
<a href="mailto:[email protected]">Email</a>
→ [Email](mailto:[email protected])
<img>)Mapping: 
Implementation:
src attribute becomes URLalt attribute becomes alt texttitle attribute becomes optional titlewidth, height) captured in metadataExamples:
<img src="photo.jpg" alt="A photo">
→ 
<img src="image.png" alt="Image" title="My Image" width="200" height="150">
→ 
<img src="data:image/png;base64,..." alt="Embedded">
→ 
<br>)Mapping:
\n\\\nOption: NewlineStyle::Spaces (default) or NewlineStyle::Backslash
Example:
<p>Line 1<br>Line 2</p>
→ Line 1 \nLine 2\n
<s>, <del>, <strike>)Mapping: ~~strikethrough~~
Implementation:
Example:
<del>removed text</del> → ~~removed text~~
<s>strikethrough</s> → ~~strikethrough~~
<sub>, <sup>)Behavior:
Example:
H<sub>2</sub>O → H2O (plain text)
E=mc<sup>2</sup> → E=mc2 (plain text)
<mark>)Options:
HighlightStyle::DoubleEqual: ==text==HighlightStyle::Html: <mark>text</mark>HighlightStyle::Bold: **text**HighlightStyle::None: plain textExample:
<mark>highlighted</mark>
→ ==highlighted== (DoubleEqual mode)
→ <mark>highlighted</mark> (Html mode)
→ **highlighted** (Bold mode)
<ruby>, <rt>, <rp>)Mapping:
text {rt_text} or similarExample:
<ruby>漢字<rt>かんじ</rt></ruby>
→ 漢字 (かんじ)
<audio>)Behavior:
src attributeHandling:
<audio src="sound.mp3">Audio</audio>
→ (Skipped or converted to link in metadata)
<video>)Behavior:
poster image<picture>, <source>)Behavior:
<img> insidesrc<input>)Behavior:
Implementation:
<select>, <option>)Behavior:
<button>)Behavior:
<button> wrapper)<textarea>)Behavior:
<svg>)Behavior:
inline-images can extract inline SVG<math>)Behavior:
<iframe>)Behavior:
Normalized (default):
Strict:
Options:
escape_asterisks: * → \*escape_underscores: _ → \_escape_misc: Special chars \ & < [ > ~ # = + | -`escape_ascii: All ASCII punctuation (CommonMark spec)Example:
<p>Price: $10 & free shipping *limited time*</p>
escape_misc=true:
→ Price: $10 \& free shipping *limited time*
escape_asterisks=true:
→ Price: $10 & free shipping \*limited time\*
escape_ascii=true:
→ Price: \$10 \& free shipping \*limited time\*
Key Files:
/crates/html-to-markdown/src/converter.rs - Element dispatch and conversion/crates/html-to-markdown/src/options.rs - Style configuration enums/crates/html-to-markdown/src/text.rs - Text escaping and normalization// From converter.rs pattern
match element.tag_name() {
"h1" | "h2" | "h3" | "h4" | "h5" | "h6" => convert_heading(...),
"p" => convert_paragraph(...),
"a" => convert_link(...),
"img" => convert_image(...),
"strong" | "b" => convert_strong(...),
"em" | "i" => convert_em(...),
"code" => convert_code(...),
"pre" => convert_pre(...),
"blockquote" => convert_blockquote(...),
"ul" | "ol" => convert_list(...),
"li" => convert_list_item(...),
"table" => convert_table(...),
"br" => convert_br(...),
"hr" => convert_hr(...),
// ... 40+ more elements
_ => convert_generic_element(...)
}
See /crates/html-to-markdown/src/visitor.rs for exhaustive NodeType enum covering all 60+ supported elements.
| HTML Element | Markdown Output | Notes |
|--------------|-----------------|-------|
| <h1> | # text | ATX style default |
| <p> | text\n | Paragraph |
| <strong> | **text** | Bold |
| <em> | *text* | Italic |
| <a href> | [text](url) | Link |
| <img> |  | Image |
| <ul> | - item | Unordered list |
| <ol> | 1. item | Ordered list |
| <code> | `text` | Inline code |
| <pre> | Indented or fenced | Code block |
| <blockquote> | > text | Quote |
| <table> | GFM table | Pipe-delimited |
| <br> | \n | Line break |
| <hr> | --- | Horizontal rule |
| <del> | ~~text~~ | Strikethrough |
| <mark> | ==text== | Highlight (configurable) |
tools
Convert HTML to Markdown, Djot, or plain text with structured extraction. Use when writing code that calls html-to-markdown APIs in Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, or WASM. Covers installation, conversion, configuration, metadata extraction, document structure, and CLI usage.
development
Developer quick start guide with prerequisites, setup, and workflow commands
development
Common task runner commands for build, test, lint, and format workflows
tools
______________________________________________________________________ ## priority: high # Workspace Structure & Project Organization **Rust workspace** (Cargo.toml): crates/{kreuzberg,kreuzberg-py,kreuzberg-node,kreuzberg-ffi,kreuzberg-cli}, packages/ruby/ext/kreuzberg_rb/native, tools/{benchmark-harness,e2e-generator}, e2e/{rust,go}. **Language packages**: packages/{python,typescript,ruby,java,go} - thin wrappers around Rust core. **E2E tests**: Auto-generated from fixtures/ via tools/e2e