skills/wp-to-jekyll/SKILL.md
Migrate WordPress content to Jekyll. Use when asked to "convert WordPress to Jekyll", "migrate WP to Jekyll", "set up Jekyll from WordPress", "WordPress to static site", or "export WordPress to markdown". Covers content extraction, format conversion, Jekyll architecture setup, and deployment.
npx skillsauth add koolamusic/wpmigrate-skills wp-to-jekyllInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Step-by-step guide for converting WordPress content into a Jekyll static site. Covers extracting content from a WordPress HTML clone or XML export, transforming it into Jekyll-compatible files with proper frontmatter, setting up collections, and deploying.
Reference this skill when:
Clone the jekyllwind starter repo to get a pre-configured Jekyll + Tailwind CSS project:
git clone https://github.com/koolamusic/jekyllwind my-jekyll-site
cd my-jekyll-site
bundle install && pnpm install
bundle exec jekyll serve # Dev server at localhost:4000
This gives you a working Jekyll + Tailwind foundation with PostCSS already configured. From here, add your migrated WordPress content into the _posts/, _layouts/, and pages/ directories.
Why use the starter? Setting up Jekyll with Tailwind CSS and PostCSS from scratch requires coordinating Ruby gems, Node packages, and build config. The starter handles all of this so you can focus on migrating content.
| Tool | Version | Purpose | |------|---------|---------| | Ruby | 3.2+ | Jekyll runtime | | Bundler | latest | Ruby dependency management | | Node.js | 18+ | Tailwind CSS / asset compilation | | pnpm | latest | Node package manager (used by jekyllwind starter) | | Python 3 | 3.8+ | Content extraction and cleanup scripts | | BeautifulSoup4 | latest | HTML parsing in Python scripts |
If you prefer to set up manually instead of cloning jekyllwind:
# Gemfile
gem 'jekyll', '~> 4.4'
gem 'webrick' # Dev server (Ruby 3.x dropped it from stdlib)
gem 'jekyll-postcss-v2' # PostCSS/Tailwind integration (optional)
gem 'jekyll-feed' # RSS/Atom feed generation
gem 'jekyll-sitemap' # XML sitemap
gem 'jekyll-seo-tag' # Meta tags and structured data
gem 'logger' # Ruby 3.x stdlib extraction
gem 'csv' # Ruby 3.x stdlib extraction
gem 'base64' # Ruby 3.x stdlib extraction
bundle install && pnpm install # Install all dependencies
bundle exec jekyll serve # Dev server at localhost:4000
bundle exec jekyll build # Production build to _site/
You need WordPress content in one of two forms:
Mirror the live WordPress site to capture rendered HTML and media:
# HTTrack
httrack "https://your-site.com" -O ./mirror \
--mirror --robots=0 --depth=10
# wget alternative
wget --mirror --convert-links --adjust-extension \
--page-requisites --no-parent https://your-site.com
The mirror gives you the rendered output of every page, including page builder content, plugin output, and all media files.
Export from WP Admin → Tools → Export. This gives you structured content with metadata but no media files and no rendered page builder output.
The XML export provides metadata (dates, categories, tags). The mirror provides clean rendered HTML and media files. Cross-reference both for the best result.
A Python script parses each source file, classifies it by URL pattern, extracts frontmatter, pulls body content, and writes Jekyll-compatible files.
| WordPress URL Pattern | Jekyll Output | Collection |
|----------------------|---------------|------------|
| /YYYY/MM/DD/slug/ | _posts/YYYY-MM-DD-slug.html | Blog posts |
| /category/slug/ | Skip (Jekyll generates these) | — |
| /tag/slug/ | Skip (Jekyll generates these) | — |
| /author/slug/ | Skip or pages/ | — |
| /page-slug/ | pages/slug.html | Standalone pages |
| Custom post types | _collection-name/slug.html | Custom collections |
Derive YAML frontmatter from WordPress HTML meta tags or WXR XML:
From HTML mirror (OpenGraph tags):
# Meta tags → frontmatter fields
'og:title' → title
'og:description' → description
'og:image' → image
'og:url' → permalink
'article:published_time' → date
'article:modified_time' → last_modified_at
'article:tag' → tags (multiple)
'article:section' → categories
From WXR XML:
# XML elements → frontmatter fields
'<title>' → title
'<wp:post_date>' → date
'<category domain="category">' → categories
'<category domain="post_tag">' → tags
'<wp:status>' → published (true/false)
'<content:encoded>' → body content
Isolate the article body from WordPress page chrome:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Target the main content div — class varies by theme
content = (
soup.find("div", class_="post-content") or
soup.find("div", class_="entry-content") or
soup.find("article", class_="post") or
soup.find("main")
)
This strips navigation, sidebars, related posts, comments, and footer.
data-src over src (WordPress lazy-loading stores real URL in data-src)?resize=800,600&ssl=1 → clean URLwp-content/uploads/2024/01/photo.jpg → /assets/images/uploads/2024/01/photo.jpgassets/images/uploads/YYYY/MM/_posts/2024-01-15-my-blog-post.html
---
layout: post
title: "My Blog Post Title"
date: 2024-01-15
description: "A brief description from the meta tag"
image: /assets/images/uploads/2024/01/featured.jpg
categories: [technology, web-development]
tags: [jekyll, wordpress, migration]
---
<p>The extracted and cleaned article content goes here...</p>
WordPress themes inject deeply nested wrapper divs, custom classes, and inline styles. A cleanup script handles these systematically.
Run a Python script with BeautifulSoup to clean WordPress artifacts:
Strip Gutenberg comments — Remove <!-- wp:paragraph -->, <!-- /wp:image -->, etc.
import re
content = re.sub(r'<!--\s*/?wp:.*?-->', '', content)
Unwrap page builder containers — Peel nested wrappers from Visual Composer, Elementor, Divi:
# Classes to unwrap (element replaced by its children)
unwrap_classes = [
'wpb_row', 'row-fluid', 'vc_inner', 'vc_column_container',
'wp-block-image', 'wp-block-embed', 'wp-block-gallery',
'elementor-widget-container', 'et_pb_module_inner'
]
for cls in unwrap_classes:
for el in soup.find_all(class_=cls):
el.unwrap()
Clean images — Strip WordPress-specific attributes, add lazy loading:
for img in soup.find_all('img'):
# Use data-src if available (lazy loading plugins)
if img.get('data-src'):
img['src'] = img['data-src']
# Strip WP attributes
for attr in ['data-src', 'srcset', 'sizes', 'width', 'height',
'decoding', 'fetchpriority', 'title']:
img.attrs.pop(attr, None)
# Strip WP classes
if img.get('class'):
img['class'] = [c for c in img['class']
if not c.startswith(('wp-image-', 'aligncenter', 'size-'))]
img['loading'] = 'lazy'
Strip inline styles — WordPress content often has inline style attributes that override new theme CSS:
for el in soup.find_all(style=True):
del el['style']
Remove empty elements — Iteratively delete empty <p>, <div>, <span>, headings:
for tag_name in ['p', 'div', 'span', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
for el in soup.find_all(tag_name):
if not el.get_text(strip=True) and not el.find(['img', 'iframe', 'video']):
el.decompose()
Remove duplicate headings — If <h1> text matches the frontmatter title, remove it (the Jekyll layout renders it)
Embed media URLs — Convert bare YouTube/Vimeo URLs to responsive iframes:
# Convert: https://www.youtube.com/watch?v=XXXX
# To: <div class="aspect-video"><iframe src="https://www.youtube.com/embed/XXXX" ...></iframe></div>
python3 scripts/clean.py # Dry-run: prints changes
python3 scripts/clean.py --apply # Apply changes in-place
python3 scripts/clean.py --backup # Backup originals first
python3 scripts/clean.py --file x.html # Process single file
If you cloned the jekyllwind starter, you already have the base structure. Extend it for your migrated content:
my-jekyll-site/ # git clone https://github.com/koolamusic/jekyllwind
├── _config.yml # Site config, collections, defaults, plugins
├── _posts/ # Blog posts (YYYY-MM-DD-slug.html or .md)
├── _drafts/ # Unpublished posts (add this)
├── _layouts/ # Page templates (from starter)
│ ├── default.html # Base layout: <html>, <head>, nav, footer
│ ├── post.html # Blog post template
│ └── page.html # Generic page template
├── _includes/ # Reusable components (add as needed)
│ ├── header.html
│ ├── footer.html
│ └── post-card.html
├── pages/ # Standalone pages (about, contact, etc.)
├── assets/
│ ├── css/main.css # Tailwind directives (from starter)
│ └── images/uploads/ # Migrated WordPress media (YYYY/MM/)
├── Gemfile # Ruby dependencies (from starter)
├── package.json # Node dependencies (from starter)
├── tailwind.config.js # Tailwind theme config (from starter)
├── postcss.config.js # PostCSS pipeline (from starter)
└── netlify.toml # Deployment config (add this)
Map WordPress custom post types to Jekyll collections in _config.yml:
# Preserve WordPress URL structure
permalink: /:year/:month/:day/:title/
collections:
# Example: WordPress 'portfolio' post type → Jekyll collection
portfolio:
output: true
permalink: /portfolio/:title/
# Example: WordPress 'talks' post type
talks:
output: true
permalink: /talks/:title/
# Set default layouts per collection
defaults:
- scope: { path: "", type: "posts" }
values: { layout: "post" }
- scope: { path: "", type: "portfolio" }
values: { layout: "portfolio" }
- scope: { path: "", type: "talks" }
values: { layout: "portfolio" }
- scope: { path: "pages" }
values: { layout: "page" }
# _config.yml
title: "Your Site Name"
url: "https://your-site.com"
description: "Site description for SEO"
# Plugins
plugins:
- jekyll-feed
- jekyll-sitemap
- jekyll-seo-tag
- jekyll-postcss-v2 # Only if using Tailwind
# Feed configuration (generates RSS)
feed:
collections:
- posts
- portfolio
You can optionally convert extracted HTML content to Markdown for easier editing. Not all content converts cleanly — complex layouts, tables, and embedded media may be better left as HTML.
Good candidates for Markdown conversion:
Keep as HTML:
Conversion approach:
import markdownify
# Convert HTML to Markdown, preserving images and links
markdown_content = markdownify.markdownify(
html_content,
heading_style="atx", # Use # style headings
bullets="-", # Use - for unordered lists
strip=['script', 'style'] # Remove script and style tags
)
| WordPress Field | Jekyll Frontmatter | Notes |
|----------------|-------------------|-------|
| Post title | title | Wrap in quotes if contains colons |
| Published date | date | Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS +0000 |
| Slug | permalink | Only if overriding the default pattern |
| Categories | categories | Array: [cat1, cat2] |
| Tags | tags | Array: [tag1, tag2] |
| Featured image | image | Path to local file in assets/images/ |
| Meta description | description | From Yoast/RankMath or og:description |
| Author | author | String or reference to _data/authors.yml |
| Post status: draft | Move to _drafts/ | Drafts don't need a date prefix in filename |
| Custom fields | Custom frontmatter keys | Map ACF fields to meaningful frontmatter names |
| Password protected | protected: true | Implement client-side gating in layout |
bundle exec jekyll serve --livereload # Dev server with auto-reload
bundle exec jekyll serve --drafts # Include drafts
bundle exec jekyll build # Production build to _site/
# netlify.toml
[build]
command = "bundle exec jekyll build"
publish = "_site"
[build.environment]
JEKYLL_ENV = "production"
RUBY_VERSION = "3.2.0"
NODE_VERSION = "18"
# .github/workflows/jekyll.yml
name: Deploy Jekyll
on:
push:
branches: [main]
jobs:
build-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: '3.2'
bundler-cache: true
- run: bundle exec jekyll build
- uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./_site
Preserve WordPress URLs that changed during migration:
# netlify.toml redirects
[[redirects]]
from = "/feed/"
to = "/feed.xml"
status = 301
[[redirects]]
from = "/wp-content/uploads/*"
to = "/assets/images/uploads/:splat"
status = 301
For Jekyll-native redirects, use the jekyll-redirect-from gem:
# In a post's frontmatter
redirect_from:
- /old-url/
- /another-old-url/
WordPress lazy loading: data-src vs src — Lazy-loading plugins store the real image URL in data-src and a placeholder in src. Always check data-src first.
Image query parameters must be stripped — WordPress appends ?resize=800,600&ssl=1. These break on static hosting.
Gutenberg comments have varied syntax — Some are self-closing (<!-- wp:jetpack/slideshow {...} /-->), some wrap content. Use regex: <!--\s*/?wp:.*?-->.
Visual Composer nesting is extreme — A single image can be wrapped in 5+ layers of divs. Your cleanup script needs multiple unwrapping passes.
cssnano + csso/css-tree incompatibility — If using PostCSS, do NOT add cssnano. It pulls in csso which breaks with certain css-tree versions.
jekyll-postcss-v2 requires empty frontmatter — Your CSS file must start with ---\n--- for Jekyll to process it through PostCSS.
Tailwind arbitrary calc() values fail with spaces — w-[calc(100%-2rem)] works; w-[calc(100% - 2rem)] does not.
Inline style attributes override Tailwind dark mode — WordPress content with style="color: #333" overrides dark:text-white. Strip all inline styles during cleanup.
Multiple collections can share a layout — Use _config.yml defaults to assign the same layout to similar collection types, avoiding duplication.
Preserve WordPress permalink structure — Set permalink: /:year/:month/:day/:title/ to maintain existing URLs and prevent 404s from external links and search engines.
tools
Best practices for migrating content out of WordPress. Use when asked to "migrate from WordPress", "export WordPress content", "move off WordPress", "WordPress migration strategy", or "extract WordPress data". Covers XML export, site mirroring, plugin-specific content, and migration planning.
development
# WordPress-to-Jekyll Migration Playbook A reusable guide for migrating a WordPress site (via static HTML clone) to Jekyll 4.4 + Tailwind CSS 3.4, deployed on Netlify. Based on the migration of [andrewmiracle.com](https://andrewmiracle.com) — 429 static HTML pages spanning 2012–2025, converted to a fully themed Jekyll site with dark mode, digital garden features, password-protected projects, and RSS feeds. **Source:** 429 WordPress pages scraped to static HTML **Target:** Jekyll 4.4.1 + Tailwi
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.