scientific-skills/pydicom/SKILL.md
Python library for working with DICOM (Digital Imaging and Communications in Medicine) files. Use this skill when reading, writing, or modifying medical imaging data in DICOM format, extracting pixel data from medical images (CT, MRI, X-ray, ultrasound), anonymizing DICOM files, working with DICOM metadata and tags, converting DICOM images to other formats, handling compressed DICOM data, or processing medical imaging datasets. Applies to tasks involving medical image analysis, PACS systems, radiology workflows, and healthcare imaging applications.
npx skillsauth add K-Dense-AI/claude-scientific-skills pydicomInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pydicom is a pure Python package for working with DICOM files, the standard format for medical imaging data. This skill provides guidance on reading, writing, and manipulating DICOM files, including working with pixel data, metadata, and various compression formats.
Use this skill when working with:
Install pydicom and common dependencies:
uv pip install pydicom
uv pip install pillow # For image format conversion
uv pip install numpy # For pixel array manipulation
uv pip install matplotlib # For visualization
For handling compressed DICOM files, additional packages may be needed:
uv pip install pylibjpeg pylibjpeg-libjpeg pylibjpeg-openjpeg # JPEG compression
uv pip install python-gdcm # Alternative compression handler
Read a DICOM file using pydicom.dcmread():
import pydicom
# Read a DICOM file
ds = pydicom.dcmread('path/to/file.dcm')
# Access metadata
print(f"Patient Name: {ds.PatientName}")
print(f"Study Date: {ds.StudyDate}")
print(f"Modality: {ds.Modality}")
# Display all elements
print(ds)
Key points:
dcmread() returns a Dataset objectds.PatientName) or tag notation (e.g., ds[0x0010, 0x0010])ds.file_meta to access file metadata like Transfer Syntax UIDgetattr(ds, 'AttributeName', default_value) or hasattr(ds, 'AttributeName')Extract and manipulate image data from DICOM files:
import pydicom
import numpy as np
import matplotlib.pyplot as plt
# Read DICOM file
ds = pydicom.dcmread('image.dcm')
# Get pixel array (requires numpy)
pixel_array = ds.pixel_array
# Image information
print(f"Shape: {pixel_array.shape}")
print(f"Data type: {pixel_array.dtype}")
print(f"Rows: {ds.Rows}, Columns: {ds.Columns}")
# Apply windowing for display (CT/MRI)
if hasattr(ds, 'WindowCenter') and hasattr(ds, 'WindowWidth'):
from pydicom.pixel_data_handlers.util import apply_voi_lut
windowed_image = apply_voi_lut(pixel_array, ds)
else:
windowed_image = pixel_array
# Display image
plt.imshow(windowed_image, cmap='gray')
plt.title(f"{ds.Modality} - {ds.StudyDescription}")
plt.axis('off')
plt.show()
Working with color images:
# RGB images have shape (rows, columns, 3)
if ds.PhotometricInterpretation == 'RGB':
rgb_image = ds.pixel_array
plt.imshow(rgb_image)
elif ds.PhotometricInterpretation == 'YBR_FULL':
from pydicom.pixel_data_handlers.util import convert_color_space
rgb_image = convert_color_space(ds.pixel_array, 'YBR_FULL', 'RGB')
plt.imshow(rgb_image)
Multi-frame images (videos/series):
# For multi-frame DICOM files
if hasattr(ds, 'NumberOfFrames') and ds.NumberOfFrames > 1:
frames = ds.pixel_array # Shape: (num_frames, rows, columns)
print(f"Number of frames: {frames.shape[0]}")
# Display specific frame
plt.imshow(frames[0], cmap='gray')
Use the provided dicom_to_image.py script or convert manually:
from PIL import Image
import pydicom
import numpy as np
ds = pydicom.dcmread('input.dcm')
pixel_array = ds.pixel_array
# Normalize to 0-255 range
if pixel_array.dtype != np.uint8:
pixel_array = ((pixel_array - pixel_array.min()) /
(pixel_array.max() - pixel_array.min()) * 255).astype(np.uint8)
# Save as PNG
image = Image.fromarray(pixel_array)
image.save('output.png')
Use the script: python scripts/dicom_to_image.py input.dcm output.png
Modify DICOM data elements:
import pydicom
from datetime import datetime
ds = pydicom.dcmread('input.dcm')
# Modify existing elements
ds.PatientName = "Doe^John"
ds.StudyDate = datetime.now().strftime('%Y%m%d')
ds.StudyDescription = "Modified Study"
# Add new elements
ds.SeriesNumber = 1
ds.SeriesDescription = "New Series"
# Remove elements
if hasattr(ds, 'PatientComments'):
delattr(ds, 'PatientComments')
# Or using del
if 'PatientComments' in ds:
del ds.PatientComments
# Save modified file
ds.save_as('modified.dcm')
Remove or replace patient identifiable information:
import pydicom
from datetime import datetime
ds = pydicom.dcmread('input.dcm')
# Tags commonly containing PHI (Protected Health Information)
tags_to_anonymize = [
'PatientName', 'PatientID', 'PatientBirthDate',
'PatientSex', 'PatientAge', 'PatientAddress',
'InstitutionName', 'InstitutionAddress',
'ReferringPhysicianName', 'PerformingPhysicianName',
'OperatorsName', 'StudyDescription', 'SeriesDescription',
]
# Remove or replace sensitive data
for tag in tags_to_anonymize:
if hasattr(ds, tag):
if tag in ['PatientName', 'PatientID']:
setattr(ds, tag, 'ANONYMOUS')
elif tag == 'PatientBirthDate':
setattr(ds, tag, '19000101')
else:
delattr(ds, tag)
# Update dates to maintain temporal relationships
if hasattr(ds, 'StudyDate'):
# Shift dates by a random offset
ds.StudyDate = '20000101'
# Keep pixel data intact
ds.save_as('anonymized.dcm')
Use the provided script: python scripts/anonymize_dicom.py input.dcm output.dcm
Create DICOM files from scratch:
import pydicom
from pydicom.dataset import Dataset, FileDataset
from datetime import datetime
import numpy as np
# Create file meta information
file_meta = Dataset()
file_meta.MediaStorageSOPClassUID = pydicom.uid.generate_uid()
file_meta.MediaStorageSOPInstanceUID = pydicom.uid.generate_uid()
file_meta.TransferSyntaxUID = pydicom.uid.ExplicitVRLittleEndian
# Create the FileDataset instance
ds = FileDataset('new_dicom.dcm', {}, file_meta=file_meta, preamble=b"\0" * 128)
# Add required DICOM elements
ds.PatientName = "Test^Patient"
ds.PatientID = "123456"
ds.Modality = "CT"
ds.StudyDate = datetime.now().strftime('%Y%m%d')
ds.StudyTime = datetime.now().strftime('%H%M%S')
ds.ContentDate = ds.StudyDate
ds.ContentTime = ds.StudyTime
# Add image-specific elements
ds.SamplesPerPixel = 1
ds.PhotometricInterpretation = "MONOCHROME2"
ds.Rows = 512
ds.Columns = 512
ds.BitsAllocated = 16
ds.BitsStored = 16
ds.HighBit = 15
ds.PixelRepresentation = 0
# Create pixel data
pixel_array = np.random.randint(0, 4096, (512, 512), dtype=np.uint16)
ds.PixelData = pixel_array.tobytes()
# Add required UIDs
ds.SOPClassUID = pydicom.uid.CTImageStorage
ds.SOPInstanceUID = file_meta.MediaStorageSOPInstanceUID
ds.SeriesInstanceUID = pydicom.uid.generate_uid()
ds.StudyInstanceUID = pydicom.uid.generate_uid()
# Save the file
ds.save_as('new_dicom.dcm')
Handle compressed DICOM files:
import pydicom
# Read compressed DICOM file
ds = pydicom.dcmread('compressed.dcm')
# Check transfer syntax
print(f"Transfer Syntax: {ds.file_meta.TransferSyntaxUID}")
print(f"Transfer Syntax Name: {ds.file_meta.TransferSyntaxUID.name}")
# Decompress and save as uncompressed
ds.decompress()
ds.save_as('uncompressed.dcm', write_like_original=False)
# Or compress when saving (requires appropriate encoder)
ds_uncompressed = pydicom.dcmread('uncompressed.dcm')
ds_uncompressed.compress(pydicom.uid.JPEGBaseline8Bit)
ds_uncompressed.save_as('compressed_jpeg.dcm')
Common transfer syntaxes:
ExplicitVRLittleEndian - Uncompressed, most commonJPEGBaseline8Bit - JPEG lossy compressionJPEGLossless - JPEG lossless compressionJPEG2000Lossless - JPEG 2000 losslessRLELossless - Run-Length Encoding losslessSee references/transfer_syntaxes.md for complete list.
Handle nested data structures:
import pydicom
ds = pydicom.dcmread('file.dcm')
# Access sequences
if 'ReferencedStudySequence' in ds:
for item in ds.ReferencedStudySequence:
print(f"Referenced SOP Instance UID: {item.ReferencedSOPInstanceUID}")
# Create a sequence
from pydicom.sequence import Sequence
sequence_item = Dataset()
sequence_item.ReferencedSOPClassUID = pydicom.uid.CTImageStorage
sequence_item.ReferencedSOPInstanceUID = pydicom.uid.generate_uid()
ds.ReferencedImageSequence = Sequence([sequence_item])
Work with multiple related DICOM files:
import pydicom
import numpy as np
from pathlib import Path
# Read all DICOM files in a directory
dicom_dir = Path('dicom_series/')
slices = []
for file_path in dicom_dir.glob('*.dcm'):
ds = pydicom.dcmread(file_path)
slices.append(ds)
# Sort by slice location or instance number
slices.sort(key=lambda x: float(x.ImagePositionPatient[2]))
# Or: slices.sort(key=lambda x: int(x.InstanceNumber))
# Create 3D volume
volume = np.stack([s.pixel_array for s in slices])
print(f"Volume shape: {volume.shape}") # (num_slices, rows, columns)
# Get spacing information for proper scaling
pixel_spacing = slices[0].PixelSpacing # [row_spacing, col_spacing]
slice_thickness = slices[0].SliceThickness
print(f"Voxel size: {pixel_spacing[0]}x{pixel_spacing[1]}x{slice_thickness} mm")
This skill includes utility scripts in the scripts/ directory:
Anonymize DICOM files by removing or replacing Protected Health Information (PHI).
python scripts/anonymize_dicom.py input.dcm output.dcm
Convert DICOM files to common image formats (PNG, JPEG, TIFF).
python scripts/dicom_to_image.py input.dcm output.png
python scripts/dicom_to_image.py input.dcm output.jpg --format JPEG
Extract and display DICOM metadata in a readable format.
python scripts/extract_metadata.py file.dcm
python scripts/extract_metadata.py file.dcm --output metadata.txt
Detailed reference information is available in the references/ directory:
Issue: "Unable to decode pixel data"
uv pip install pylibjpeg pylibjpeg-libjpeg python-gdcmIssue: "AttributeError" when accessing tags
hasattr(ds, 'AttributeName') or use ds.get('AttributeName', default)Issue: Incorrect image display (too dark/bright)
apply_voi_lut(pixel_array, ds) or manually adjust with WindowCenter and WindowWidthIssue: Memory issues with large series
hasattr() or get()save_as() with write_like_original=TrueOfficial pydicom documentation: https://pydicom.github.io/pydicom/dev/
development
Spectral similarity and compound identification for metabolomics. Use for comparing mass spectra, computing similarity scores (cosine, modified cosine), and identifying unknown compounds from spectral libraries. Best for metabolite identification, spectral matching, library searching. For full LC-MS/MS proteomics pipelines use pyopenms.
development
Convert files and office documents to Markdown. Supports PDF, DOCX, PPTX, XLSX, images (with OCR), audio (with transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs and more.
development
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
testing
Comprehensive markdown and Mermaid diagram writing skill. Use when creating any scientific document, report, analysis, or visualization. Establishes text-based diagrams as the default documentation standard with full style guides (markdown + mermaid), 24 diagram type references, and 9 document templates.