.remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/polyglot-api-documentation-examples/SKILL.md
______________________________________________________________________ ## priority: high # Polyglot API Documentation Examples **CRITICAL: Complete language parity for API documentation.** ALL public APIs MUST be documented with examples in all 10 supported languages: Rust, Python, TypeScript, Ruby, PHP, Java, Go, C#, Elixir, and WebAssembly. ## Documentation Tools by Language | Language | Tool | Output Format | File Extension | Key Strengths | |----------|------|---------------|-----------
npx skillsauth add kreuzberg-dev/html-to-markdown .remote-cache/kreuzberg-shared-rules/.ai-rulez/skills/polyglot-api-documentation-examplesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
CRITICAL: Complete language parity for API documentation. ALL public APIs MUST be documented with examples in all 10 supported languages: Rust, Python, TypeScript, Ruby, PHP, Java, Go, C#, Elixir, and WebAssembly.
| Language | Tool | Output Format | File Extension | Key Strengths |
|----------|------|---------------|-----------------|---------------|
| Rust | rustdoc | HTML, integrated with cargo doc | .rs | Markdown, examples as doctests, cross-referencing |
| Python | Sphinx | HTML, PDF, ePub | .py | reStructuredText integration, autodoc directives |
| TypeScript | TypeDoc | HTML, Markdown, JSON | .ts / .tsx | JSDoc parsing, template customization, JSON exports |
| Ruby | YARD | HTML, Markdown, tags | .rb | @param/@return/@example, @overload support |
| PHP | PHPDocumentor | HTML, PDF | .php | @param/@return/@throws, markdown descriptions |
| Java | Javadoc | HTML | .java | @param/@return/@throws/@since/@deprecated tags |
| Go | godoc | HTML, plaintext | .go | doc.go files, package-level docs, examples in _test.go |
| C# | DocFX | HTML, Markdown | .cs | XML doc comments, xref links, TOC generation |
| Elixir | ExDoc | HTML, EPUB | .ex | Markdown, @doc/@spec, @deprecated, live examples |
| WebAssembly | wasm-doc | Markdown, HTML | .wasm / .js | JSDoc for JS wrapper, inline comments |
Every public API must include:
/// Processes HTML and converts it to Markdown.
///
/// Takes HTML input and converts it to semantically equivalent Markdown,
/// preserving all text content, links, lists, and formatting.
///
/// # Arguments
/// * `html` - Valid HTML string (can be partial HTML, e.g., fragments without DOCTYPE)
/// * `options` - Configuration options for the conversion
///
/// # Returns
/// `Result<String, Error>` containing the Markdown output or an error
///
/// # Errors
/// * `Error::InvalidUtf8` - If HTML contains invalid UTF-8 sequences
/// * `Error::ProcessingFailed` - If internal processing fails
///
/// # Examples
///
/// ```
/// use kreuzberg::HtmlConverter;
///
/// let html = "<h1>Hello</h1><p>World</p>";
/// let md = HtmlConverter::convert(html)?;
/// assert!(md.contains("# Hello"));
/// # Ok::<_, Box<dyn std::error::Error>>(())
/// ```
///
/// # Performance
/// - Time: O(n) where n is HTML input size
/// - Memory: O(n) for output string
pub fn convert(html: &str, options: &Options) -> Result<String, Error> {
// implementation
}
Documentation tools: cargo doc, rustdoc with mdBook for guides
def convert_html_to_markdown(
html: str,
options: Optional[ConversionOptions] = None
) -> str:
"""Convert HTML to Markdown with semantic preservation.
Processes HTML input and produces semantically equivalent Markdown output.
Supports partial HTML fragments (no DOCTYPE required).
Args:
html: Valid HTML string to convert. Can be a complete document or fragment.
options: Configuration for the conversion process. Defaults to standard settings.
Returns:
Markdown representation of the input HTML.
Raises:
ValueError: If HTML is invalid or contains unsupported encodings.
ProcessingError: If internal conversion fails.
Examples:
Basic conversion:
>>> from kreuzberg import convert_html_to_markdown
>>> html = "<h1>Hello</h1><p>World</p>"
>>> md = convert_html_to_markdown(html)
>>> "# Hello" in md
True
With custom options:
>>> opts = ConversionOptions(preserve_attributes=True)
>>> result = convert_html_to_markdown(html, opts)
Note:
This function is CPU-bound. For processing large documents,
consider using ``convert_html_to_markdown_async()`` in an event loop.
Performance:
- Time complexity: O(n) where n is HTML input size
- Memory usage: O(n) for output string
"""
# implementation
Documentation tools: Sphinx, autodoc, napoleon extension for Google style
/**
* Converts HTML to Markdown with complete semantic preservation.
*
* Takes HTML input (full documents or fragments) and produces semantically
* equivalent Markdown output. Handles all standard HTML elements and preserves
* formatting, links, and structure.
*
* @param html - The HTML string to convert. Can be a complete document or fragment.
* @param options - Configuration options for the conversion process.
* @returns Promise<string> - The converted Markdown text.
* @throws {InvalidHtmlError} If the HTML is malformed or unsupported.
* @throws {ProcessingError} If conversion fails internally.
*
* @example
* Basic usage:
* ```typescript
* import { convertHtmlToMarkdown } from 'kreuzberg';
*
* const html = '<h1>Hello</h1><p>World</p>';
* const markdown = await convertHtmlToMarkdown(html);
* console.log(markdown); // # Hello\n\nWorld
* ```
*
* @example
* With options:
* ```typescript
* const options = { preserveAttributes: true };
* const result = await convertHtmlToMarkdown(html, options);
* ```
*
* @remarks
* - Async operation for better performance with large documents
* - Fully typed with TypeScript strict mode compatibility
* - Uses native browser APIs when available
*
* @see {@link ConversionOptions} for available configuration
* @see {@link convertHtmlToMarkdownSync} for synchronous version
* @beta This API may change in future versions
*/
export async function convertHtmlToMarkdown(
html: string,
options?: ConversionOptions
): Promise<string> {
// implementation
}
Documentation tools: TypeDoc, JSDoc, TypeScript compiler, markdown output
# Converts HTML to Markdown with complete semantic preservation.
#
# Processes HTML input (complete documents or fragments) and generates
# semantically equivalent Markdown output. All formatting, links, and
# document structure are preserved in the conversion.
#
# @param html [String] The HTML string to convert. Can be a complete
# document or fragment (no DOCTYPE required).
# @param options [ConversionOptions, Hash] Configuration for the conversion.
# Defaults to standard conversion settings.
#
# @return [String] The converted Markdown text.
#
# @raise [InvalidHtmlError] If the HTML is malformed or contains
# unsupported syntax.
# @raise [ProcessingError] If the conversion process fails internally.
#
# @example Basic conversion
# require 'kreuzberg'
#
# html = '<h1>Hello</h1><p>World</p>'
# markdown = Kreuzberg.convert_html_to_markdown(html)
# markdown.include?('# Hello') #=> true
#
# @example With custom options
# options = { preserve_attributes: true }
# result = Kreuzberg.convert_html_to_markdown(html, options)
#
# @see ConversionOptions
# @see #convert_html_to_markdown_with_defaults
# @since 1.0.0
# @deprecated Use {#convert_with_streaming} for large documents
def self.convert_html_to_markdown(html, options = {})
# implementation
end
Documentation tools: YARD, yard-doc gem, markdown support with kramdown
/**
* Converts HTML to Markdown with complete semantic preservation.
*
* Takes HTML input (full documents or fragments) and produces semantically
* equivalent Markdown output. All formatting, links, and document structure
* are preserved.
*
* @param string $html The HTML string to convert. Can be a complete
* document or fragment (no DOCTYPE required).
* @param ConversionOptions|null $options Configuration for the conversion
* process. Null uses default settings.
*
* @return string The converted Markdown text.
*
* @throws InvalidHtmlException If HTML is malformed or unsupported
* @throws ProcessingException If the conversion fails internally
*
* @example Basic usage:
* ```php
* $html = '<h1>Hello</h1><p>World</p>';
* $markdown = Kreuzberg::convertHtmlToMarkdown($html);
* echo $markdown; // # Hello\n\nWorld
* ```
*
* @example With options:
* ```php
* $options = new ConversionOptions();
* $options->preserveAttributes = true;
* $result = Kreuzberg::convertHtmlToMarkdown($html, $options);
* ```
*
* @see ConversionOptions
* @see \Kreuzberg\Converter for streaming API
* @since 1.0.0
* @api
*/
public static function convertHtmlToMarkdown(
string $html,
?ConversionOptions $options = null
): string {
// implementation
}
Documentation tools: PHPDocumentor, phpdoc tags, markdown in descriptions
/**
* Converts HTML to Markdown with complete semantic preservation.
*
* <p>Processes HTML input (complete documents or fragments) and generates
* semantically equivalent Markdown output. All formatting, links, and
* document structure are preserved.</p>
*
* @param html the HTML string to convert; can be a complete document
* or fragment (no DOCTYPE required). Must not be null.
* @param options configuration for the conversion process; if null,
* uses default conversion settings
*
* @return the converted Markdown text; never null
*
* @throws IllegalArgumentException if html is null
* @throws InvalidHtmlException if the HTML is malformed or contains
* unsupported syntax
* @throws ProcessingException if the conversion fails internally
*
* @example Basic usage:
* <pre>{@code
* String html = "<h1>Hello</h1><p>World</p>";
* String markdown = HtmlConverter.convertToMarkdown(html);
* System.out.println(markdown); // # Hello\n\nWorld
* }</pre>
*
* @example With custom options:
* <pre>{@code
* ConversionOptions options = new ConversionOptions()
* .preserveAttributes(true);
* String result = HtmlConverter.convertToMarkdown(html, options);
* }</pre>
*
* @apiNote This method is thread-safe for concurrent calls.
* @since 1.0
* @see ConversionOptions
* @see #convertToMarkdownAsync(String, ConversionOptions)
*/
public static String convertToMarkdown(
String html,
ConversionOptions options
) throws InvalidHtmlException, ProcessingException {
// implementation
}
Documentation tools: Javadoc, Maven site plugin, HTML/Markdown output
// convertHTMLToMarkdown converts HTML to Markdown with complete semantic preservation.
//
// Takes HTML input (full documents or fragments) and produces semantically
// equivalent Markdown output. All formatting, links, and document structure
// are preserved in the conversion process.
//
// Parameters:
// - html: The HTML string to convert. Can be a complete document or fragment
// (no DOCTYPE required).
// - options: Configuration for the conversion process. Nil uses defaults.
//
// Returns:
// - The converted Markdown string
// - An error if conversion fails
//
// Errors:
// - InvalidHTML: If the HTML is malformed or contains unsupported syntax
// - ProcessingError: If the internal conversion process fails
//
// Example:
//
// html := "<h1>Hello</h1><p>World</p>"
// md, err := kreuzberg.ConvertHTMLToMarkdown(html, nil)
// if err != nil {
// log.Fatal(err)
// }
// fmt.Println(md) // # Hello\n\nWorld
//
// Example with options:
//
// opts := &kreuzberg.ConversionOptions{
// PreserveAttributes: true,
// }
// result, err := kreuzberg.ConvertHTMLToMarkdown(html, opts)
//
// See Also:
// - ConversionOptions for configuration details
// - ConvertHTMLToMarkdownAsync for concurrent operations
//
// Concurrency:
//
// This function is safe for concurrent use from multiple goroutines.
func ConvertHTMLToMarkdown(html string, options *ConversionOptions) (string, error) {
// implementation
}
Documentation tools: godoc, go doc, markdown in comments
/// <summary>
/// Converts HTML to Markdown with complete semantic preservation.
/// </summary>
///
/// <remarks>
/// <para>
/// Processes HTML input (complete documents or fragments) and generates
/// semantically equivalent Markdown output. All formatting, links, and
/// document structure are preserved in the conversion.
/// </para>
/// <para>
/// This method is thread-safe and can be called concurrently from multiple
/// threads.
/// </para>
/// </remarks>
///
/// <param name="html">
/// The HTML string to convert. Can be a complete document or fragment
/// (no DOCTYPE required). Must not be null.
/// </param>
/// <param name="options">
/// Configuration for the conversion process. If null, uses default settings.
/// </param>
///
/// <returns>
/// The converted Markdown text as a string. Never null.
/// </returns>
///
/// <exception cref="ArgumentNullException">
/// Thrown when <paramref name="html"/> is null.
/// </exception>
/// <exception cref="InvalidHtmlException">
/// Thrown when the HTML is malformed or contains unsupported syntax.
/// </exception>
/// <exception cref="ProcessingException">
/// Thrown when the conversion process fails internally.
/// </exception>
///
/// <example>
/// <para>Basic usage:</para>
/// <code>
/// string html = "<h1>Hello</h1><p>World</p>";
/// string markdown = HtmlConverter.ConvertToMarkdown(html);
/// Console.WriteLine(markdown); // # Hello\n\nWorld
/// </code>
/// </example>
///
/// <example>
/// <para>With custom options:</para>
/// <code>
/// var options = new ConversionOptions { PreserveAttributes = true };
/// string result = HtmlConverter.ConvertToMarkdown(html, options);
/// </code>
/// </example>
///
/// <seealso cref="ConversionOptions"/>
/// <seealso cref="ConvertToMarkdownAsync(string, ConversionOptions)"/>
/// <since>1.0</since>
public static string ConvertToMarkdown(
string html,
ConversionOptions? options = null
) {
// implementation
}
Documentation tools: DocFX, Sandcastle Help File Builder, XML to Markdown conversion
@doc """
Converts HTML to Markdown with complete semantic preservation.
Takes HTML input (full documents or fragments) and produces semantically
equivalent Markdown output. All formatting, links, and document structure
are preserved.
## Parameters
* `html` - The HTML string to convert. Can be a complete document or
fragment (no DOCTYPE required).
* `options` - Configuration for the conversion process. Defaults to
standard settings.
## Return value
The converted Markdown text as a binary string.
## Errors
* `{:error, :invalid_html}` - If the HTML is malformed or unsupported
* `{:error, :processing_failed}` - If conversion fails internally
## Examples
Basic conversion:
iex> html = "<h1>Hello</h1><p>World</p>"
iex> {:ok, md} = Kreuzberg.convert_html_to_markdown(html)
iex> String.contains?(md, "# Hello")
true
With custom options:
iex> opts = [preserve_attributes: true]
iex> {:ok, result} = Kreuzberg.convert_html_to_markdown(html, opts)
## Performance
* Time complexity: O(n) where n is HTML input size
* Memory usage: O(n) for output string
## See also
* `Kreuzberg.ConversionOptions` for configuration details
* `Kreuzberg.convert_html_to_markdown_async/2` for concurrent operations
## Since
1.0.0
"""
@spec convert_html_to_markdown(String.t(), keyword()) :: {:ok, String.t()} | {:error, atom()}
def convert_html_to_markdown(html, options \\ []) do
# implementation
end
Documentation tools: ExDoc, markdown with code blocks, @spec annotations
/**
* Converts HTML to Markdown with complete semantic preservation.
*
* Takes HTML input (full documents or fragments) and produces semantically
* equivalent Markdown output. All formatting, links, and document structure
* are preserved.
*
* @async
* @param {string} html - The HTML string to convert. Can be a complete
* document or fragment (no DOCTYPE required).
* @param {Object} [options] - Configuration for the conversion process
* @param {boolean} [options.preserveAttributes=false] - Keep HTML attributes
* @param {boolean} [options.stripComments=true] - Remove HTML comments
*
* @returns {Promise<string>} The converted Markdown text
*
* @throws {InvalidHtmlError} If the HTML is malformed or unsupported
* @throws {ProcessingError} If conversion fails internally
*
* @example
* // Basic conversion
* const html = '<h1>Hello</h1><p>World</p>';
* const markdown = await convertHtmlToMarkdown(html);
* console.log(markdown); // # Hello\n\nWorld
*
* @example
* // With options
* const options = { preserveAttributes: true };
* const result = await convertHtmlToMarkdown(html, options);
*
* @remarks
* - Requires WASM module to be initialized
* - Works in both Node.js and browser environments
* - Uses streaming internally for large documents
*
* @see {@link ConversionOptions} for all available configuration
* @see {@link convertHtmlToMarkdownSync} for synchronous version
* @deprecated Use convertHtmlToMarkdownV2 in future versions
*/
async function convertHtmlToMarkdown(html, options = {}) {
// implementation
}
Documentation tools: JSDoc, TypeDoc for TypeScript wrapper, markdown generation
When documenting a new API, ensure ALL language bindings include:
Start with the Rust implementation since all bindings depend on it:
/// New public API function.
///
/// # Examples
///
/// ```
/// // doctest here
/// ```
pub fn new_function(param: Type) -> Result<Output, Error> {
// implementation
}
For each language binding, add equivalent documentation:
Create language-specific examples:
docs/snippets/
├── rust/api/new_function.rs
├── python/api/new_function.py
├── typescript/api/new_function.ts
├── ruby/api/new_function.rb
├── php/api/new_function.php
├── java/api/NewFunction.java
├── go/api/new_function.go
├── csharp/api/NewFunction.cs
├── elixir/api/new_function.exs
└── wasm/api/new_function.js
For each language, generate and verify documentation builds:
task doc:rust # cargo doc
task doc:python # sphinx-build
task doc:typescript # typedoc
task doc:ruby # yard
task doc:php # phpdoc
task doc:java # javadoc
task doc:go # go doc
task doc:csharp # docfx
task doc:elixir # mix docs
task doc:wasm # jsdoc
ALL error documentation must be language-specific:
/// # Errors
/// Returns `Error::InvalidInput` if preconditions not met.
"""
Raises:
ValueError: If input is invalid.
"""
/**
* @throws {InvalidInputError} If input is invalid.
*/
# @raise [InvalidInputError] if input is invalid
/**
* @throws InvalidInputException If input is invalid
*/
/**
* @throws IllegalArgumentException if input is invalid
*/
// Returns error if input is invalid.
/// <exception cref="ArgumentException">Thrown if input is invalid.</exception>
"""
Raises:
* `ArgumentError` - if input is invalid
"""
For auto-generated API documentation, maintain a mapping file:
# docs/api-mapping.yaml
functions:
- name: convert_html_to_markdown
rust: convert(html: &str, options: &Options) -> Result<String, Error>
python: convert_html_to_markdown(html: str, options: Optional[ConversionOptions]) -> str
typescript: convertHtmlToMarkdown(html: string, options?: ConversionOptions): Promise<string>
ruby: Kreuzberg.convert_html_to_markdown(html, options = {})
php: Kreuzberg::convertHtmlToMarkdown(string $html, ?ConversionOptions $options): string
java: HtmlConverter.convertToMarkdown(String html, ConversionOptions options)
go: ConvertHTMLToMarkdown(html string, options *ConversionOptions) (string, error)
csharp: HtmlConverter.ConvertToMarkdown(string html, ConversionOptions? options)
elixir: Kreuzberg.convert_html_to_markdown(html, options \\ [])
wasm: convertHtmlToMarkdown(html: string, options?: object): Promise<string>
notes: Converts HTML to Markdown with semantic preservation
since: 1.0.0
status: stable
errors: InvalidHtmlError, ProcessingError
Create verification scripts to ensure documentation consistency:
#!/bin/bash
# verify-api-docs.sh
echo "Checking API documentation parity..."
# Check all languages have examples
for lang in rust python typescript ruby php java go csharp elixir wasm; do
if ! grep -r "@example\|@examples\|Examples:" docs/snippets/$lang/api/ 2>/dev/null; then
echo "❌ Missing examples for $lang"
fi
done
# Verify documentation builds
cargo doc --no-deps
sphinx-build -b html docs/python _build/python-docs
typedoc --out _build/typescript-docs
yard --output-dir _build/ruby-docs
phpdoc -d src -t _build/php-docs
javadoc -d _build/java-docs src/main/java
go doc ./...
docfx _docfx.json
mix docs
echo "✓ Documentation verification complete"
tools
Convert HTML to Markdown, Djot, or plain text with structured extraction. Use when writing code that calls html-to-markdown APIs in Rust, Python, TypeScript, Go, Ruby, PHP, Java, C#, Elixir, R, C, or WASM. Covers installation, conversion, configuration, metadata extraction, document structure, and CLI usage.
development
Developer quick start guide with prerequisites, setup, and workflow commands
development
Common task runner commands for build, test, lint, and format workflows
tools
______________________________________________________________________ ## priority: high # Workspace Structure & Project Organization **Rust workspace** (Cargo.toml): crates/{kreuzberg,kreuzberg-py,kreuzberg-node,kreuzberg-ffi,kreuzberg-cli}, packages/ruby/ext/kreuzberg_rb/native, tools/{benchmark-harness,e2e-generator}, e2e/{rust,go}. **Language packages**: packages/{python,typescript,ruby,java,go} - thin wrappers around Rust core. **E2E tests**: Auto-generated from fixtures/ via tools/e2e