skills/53-keemanxp-thematic-analysis-skill/thematic-analysis/SKILL.md
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research thematic-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill walks a user through conducting a rigorous thematic analysis (TA) on qualitative data, following the six-phase framework from Braun and Clarke (2006). It produces a Word document (.docx) write-up of the analysis and an annotated thematic map (PNG).
The skill is grounded in one source:
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101.
Where this skill cites the paper, treat those statements as the method's published position, not Claude's own.
Read these reference files as needed:
references/upfront-decisions.md — The four analytic decisions to settle before coding starts. Consult during Phase 1 (interview).references/coding-guide.md — How to generate codes well. Consult during Phase 3 (generating initial codes).references/theme-development.md — How to move from codes to themes, with worked examples. Consult during Phases 4–6.references/thematic-map.md — How to build and annotate the thematic map. Consult during Phase 6.references/quality-checklist.md — The 15-point checklist for assessing the analysis. Consult before producing the final write-up.references/pitfalls.md — The five common pitfalls. Consult after the first draft of the write-up.Also read these skills before generating outputs:
/mnt/skills/public/docx/SKILL.md) — Required for the Word document./mnt/skills/user/apa-referencing/SKILL.md) — If the user wants citations to existing literature in the analysis, format them in APA 7th Edition.If the user has a writing-style skill, do not apply it to the manuscript body — see "Writing register" under Phase 7. A writing-style skill may still apply to ancillary outputs (a plain-language summary, a blog version of the findings) if the user asks for those separately.
Before any of the six phases begin, elicit the research question(s) or objective(s) explicitly. This is the first action of the skill and is non-negotiable. The research question disciplines what counts as interesting in the data, which codes earn their keep, and which patterns rise to the level of a theme. Coding without a clear question tends to drift into surface description.
Prompt the user along these lines:
Before we begin the analysis, please state the research question(s) or objective(s) for this study. If there is more than one, list them in order of priority. If they are still in draft form, share the draft — we can sharpen them together before coding starts.
If the user is unsure or only has a study aim, help them work a draft into a workable analytic question. A good TA research question is broad enough to allow patterned meaning to surface across the data set, but narrow enough to discipline what is included and excluded.
Record the agreed research question(s) verbatim. They will be referenced explicitly in every subsequent phase:
If the analytic approach is theoretical/deductive, the research question is also tied to the theoretical framework being applied — make this link explicit at this stage, before any coding begins.
Save the agreed research question(s) to the workspace as step0_research_questions.md. Refer back to this file at the start of each subsequent phase.
Before any analysis, gather what is needed to plan the TA. Offer the user two paths up front.
Ask the user whether they have any of the following:
Read uploaded transcripts using the appropriate tool (file-reading skill for .txt/.md, docx skill for .docx, pdf-reading skill for PDFs, xlsx skill for spreadsheet-formatted survey data). Then summarise what is in the corpus and ask the user to confirm.
If the user has no materials, gather the essentials conversationally. Adapt to what they offer; do not interrogate.
About the project:
The four upfront analytic decisions (see references/upfront-decisions.md for full guidance):
These decisions are inter-related. Tendencies cluster: realist + semantic + inductive + rich description; constructionist + latent + theoretical + detailed account. But other combinations are valid — what matters is that the choices are explicit and internally consistent.
Walk the user through each decision. Do not assume realist + semantic + inductive by default just because the paper notes this is the common (often unspoken) default. Ask.
Before moving to Phase 2, produce a short plan summary and ask the user to confirm:
The first of Braun and Clarke's six phases. This phase is immersion.
Ask the user to confirm that transcription (if needed) has been done. The transcript must be at minimum a rigorous orthographic verbatim record — every word spoken, including non-verbal utterances where they carry meaning (laughter, sighs, "um", "you know"). TA does not require Jefferson-style detail.
In this phase:
Output of this phase: a familiarisation note for the user — a paragraph per data item summarising what struck you, plus a running list of initial ideas across the data set. Save this to the workspace as phase2_familiarisation.md.
If the data set is too large for full re-reading in one pass, do it in batches and combine the notes.
Before coding starts, re-read the research question(s) saved in step0_research_questions.md. Coding is inclusive but not undisciplined — the question is the compass.
A code identifies a feature of the data — semantic content or latent meaning — that appears interesting to the analyst. A code is the most basic segment of raw data that can be assessed in a meaningful way (Braun & Clarke, 2006, p. 18, citing Boyatzis).
Codes are not themes. Codes are smaller, narrower, more numerous. Themes come later.
For full guidance on what good coding looks like (including data-driven vs theory-driven approaches, manual vs software coding, inclusive coding, and contradictions), read references/coding-guide.md.
In this phase:
Output of this phase: a coded data table. For each data item, list the extracts and the code(s) applied to each. Save as phase3_codes.md. At the end, produce a consolidated code list with every code and the data extracts that sit under it.
A short worked example showing data → code, modelled on Braun and Clarke's Figure 1:
| Data extract | Codes applied | |---|---| | "it's too much like hard work I mean how much paper have you got to sign to change a flippin' name no I I mean no I no we we have thought about it half heartedly and thought no no I jus- I can't be bothered" | (1) Talked about with partner; (2) Too much hassle to change name |
A theme captures something important about the data in relation to the research question, and represents some level of patterned response or meaning across the data set.
Prevalence matters but is not decisive. A theme can appear in many items briefly, or in a few items at length. Researcher judgement — guided by the research question — decides what is a theme.
In this phase:
references/thematic-map.md.Output of this phase: a draft thematic map (saved as phase4_initial_map.png or as a markdown outline if a visual is not yet practical) and a candidate theme list with the codes under each.
End this phase with candidate themes, sub-themes, and all coded extracts grouped under them. Do not discard anything yet — Phase 5 will tell you whether the themes hold.
Refining the candidate themes. Some candidate themes will not survive. Some will collapse together. Some will split.
Use Patton's dual criterion (cited in Braun & Clarke, 2006, p. 20):
This phase has two levels of review.
Level 1 — Review at the level of the coded extracts. Read all the collated extracts under each candidate theme. Do they form a coherent pattern? If yes, move on. If no, decide whether the theme is broken or whether some extracts simply belong elsewhere. Rework as needed.
Level 2 — Review against the entire data set. Re-read the full data set. Two questions: (a) Does the candidate thematic map accurately reflect the meanings in the data set as a whole? (b) Has any new relevant data been missed in earlier coding? If so, code it now.
When refinements stop adding anything substantial, stop. Endless re-coding has diminishing returns — Braun and Clarke compare further fiddling to "rearranging the hundreds and thousands on an already nicely decorated cake" (p. 21).
Output of this phase: a refined thematic map (phase5_refined_map.png) and a refined theme list.
Now define what each theme is and what it is not.
For each theme:
Output of this phase: the final theme list with definitions, sub-themes, and final names. Save as phase6_definitions.md.
Also produce the final thematic map (phase6_final_map.png) — this is the version that will appear in the write-up.
The final write-up. This is the last phase of Braun and Clarke's framework and the deliverable of the skill.
Before drafting the report, run through the 15-point checklist in references/quality-checklist.md. Flag any items the analysis does not yet meet and fix them.
Then read references/pitfalls.md and audit the draft against the five common pitfalls. The most frequent failures: (1) describing extracts instead of analysing them, and (2) using interview questions as themes.
The write-up must use a formal academic register suitable for peer-reviewed publication. This is the deliverable standard for the manuscript body and it overrides any personal writing-style skill the user has loaded. Those preferences apply to blogs, op-eds and informal pieces — not to the findings of a thematic analysis.
Concretely, the manuscript body follows these conventions:
If the user has a writing-style skill loaded, apply it only to ancillary outputs they request separately — for instance, a plain-language summary or a blog adaptation of the findings — not to the manuscript itself.
Use the docx skill to produce a manuscript-style .docx with this structure:
Title
Author / affiliation (if provided)
1. Introduction
- Research question(s) and rationale
- Brief note on the analytic approach and the four decisions
(e.g. "An inductive, semantic, realist thematic analysis was conducted,
aiming for a rich description across the full data set.")
2. Method
- Data corpus and data set
- Participants / sources (anonymised)
- Data collection (brief, if relevant)
- Analytic procedure — describe the six phases in your own words,
citing Braun and Clarke (2006). Make the "how" explicit, not implicit.
- Researcher positionality / reflexivity (if the user wants this)
3. Findings
- Overview paragraph that names the themes and sketches the overall story
- One section per theme. For each theme:
* A definition paragraph
* Sub-themes (if any), each with a brief definition
* 2 to 4 illustrative data extracts per theme, each followed by
analytic commentary that goes BEYOND paraphrase
* Where relevant, link to existing literature
- Include the final thematic map as a figure
4. Discussion (optional, depending on what the user wants)
- Overall story across themes
- Theoretical implications
- Practical implications
- Limitations
- Future directions
References (APA 7th Edition, including Braun & Clarke, 2006)
For the findings section, do not paraphrase the extracts — paraphrasing is the most common failure of weak TA. The commentary should answer questions like: What does this theme mean? What assumptions underpin it? What are its implications? Why might participants talk about this in this way rather than another? (See Braun & Clarke, 2006, p. 24.)
Data extracts in the report should be:
Save the file as <study_title>_thematic_analysis.docx in /mnt/user-data/outputs/.
The thematic map (phase6_final_map.png) goes into the findings section as a figure. See references/thematic-map.md for how to generate it (use matplotlib with networkx-style layout, or a simple node-and-edge diagram).
Caption the figure with theme names, sub-theme names, and a short explanation of relationships if relevant.
Use present_files to give the user the .docx and the .png. Lead with the .docx.
After the first version is produced, expect revisions. Common iteration requests:
Treat each iteration as a targeted edit, not a full rewrite, unless the user asks for one.
development
Track dataset lineage, transformation steps, merge logic, and reproducibility risks in Stata workflows. Use when the user needs to explain where data came from, how it changed, or why a pipeline can be trusted.
development
Audit datasets for structure, missingness, labeling, suspicious values, duplicate identifiers, and documentation readiness. Use when a researcher asks for data QA, codebook review, sanity checks, or pre-analysis cleanup guidance.
data-ai
Design, run, and critique causal inference workflows in Stata. Use when the user is working on identification, treatment effects, DiD, IV, event studies, RD, or assumption-sensitive empirical claims.
tools
Complete survival analysis library in Python. Handles right-censored data, Kaplan-Meier curves, and Cox regression. Standard for clinical trial analysis and epidemiology.