ui/.ai/skills/laravel-data-chunking-large-datasets/SKILL.md
Process large datasets efficiently using chunk(), chunkById(), lazy(), and cursor() to reduce memory consumption and improve performance
npx skillsauth add noartem/kawa laravel-data-chunking-large-datasetsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Process large datasets efficiently by breaking them into manageable chunks to reduce memory consumption and improve performance.
// BAD: Loading all records into memory
$users = User::all(); // Could be millions of records!
foreach ($users as $user) {
$user->sendNewsletter();
}
// BAD: Even with select, still loads everything
$emails = User::pluck('email'); // Array of millions of emails
foreach ($emails as $email) {
Mail::to($email)->send(new Newsletter());
}
chunk()// Process 100 records at a time
User::chunk(100, function ($users) {
foreach ($users as $user) {
$user->calculateStatistics();
$user->save();
}
});
// With conditions
User::where('active', true)
->chunk(200, function ($users) {
foreach ($users as $user) {
ProcessUserJob::dispatch($user);
}
});
// Prevents issues when modifying records during iteration
User::where('newsletter_sent', false)
->chunkById(100, function ($users) {
foreach ($users as $user) {
$user->update(['newsletter_sent' => true]);
Mail::to($user)->send(new Newsletter());
}
});
// With custom column
Payment::where('processed', false)
->chunkById(100, function ($payments) {
foreach ($payments as $payment) {
$payment->process();
}
}, 'payment_id'); // Custom ID column
// Uses PHP generators, minimal memory footprint
User::where('created_at', '>=', now()->subDays(30))
->lazy()
->each(function ($user) {
$user->recalculateScore();
});
// With chunking size control
User::lazy(100)->each(function ($user) {
ProcessRecentUser::dispatch($user);
});
// Filter and map with lazy collections
$results = User::lazy()
->filter(fn($user) => $user->hasActiveSubscription())
->map(fn($user) => [
'id' => $user->id,
'revenue' => $user->calculateRevenue(),
])
->take(1000);
// Most memory-efficient for simple forward iteration
foreach (User::where('active', true)->cursor() as $user) {
$user->updateLastSeen();
}
// With lazy() for additional collection methods
User::where('verified', true)
->cursor()
->filter(fn($user) => $user->hasCompletedProfile())
->each(fn($user) => SendWelcomeEmail::dispatch($user));
class ExportUsersCommand extends Command
{
public function handle()
{
$file = fopen('users.csv', 'w');
// Write headers
fputcsv($file, ['ID', 'Name', 'Email', 'Created At']);
// Process in chunks to avoid memory issues
User::select('id', 'name', 'email', 'created_at')
->chunkById(500, function ($users) use ($file) {
foreach ($users as $user) {
fputcsv($file, [
$user->id,
$user->name,
$user->email,
$user->created_at->toDateTimeString(),
]);
}
// Optional: Show progress
$this->info("Processed up to ID: {$users->last()->id}");
});
fclose($file);
$this->info('Export completed!');
}
}
class SendCampaignJob implements ShouldQueue
{
public function handle()
{
$campaign = Campaign::find($this->campaignId);
// Process subscribers in chunks
$campaign->subscribers()
->where('unsubscribed', false)
->chunkById(50, function ($subscribers) use ($campaign) {
foreach ($subscribers as $subscriber) {
SendCampaignEmail::dispatch($campaign, $subscriber)
->onQueue('emails')
->delay(now()->addSeconds(rand(1, 10)));
}
// Prevent rate limiting
sleep(2);
});
}
}
class MigrateUserData extends Command
{
public function handle()
{
$bar = $this->output->createProgressBar(User::count());
User::with(['profile', 'settings'])
->chunkById(100, function ($users) use ($bar) {
DB::transaction(function () use ($users, $bar) {
foreach ($users as $user) {
// Complex transformation
$newData = $this->transformUserData($user);
NewUserModel::create($newData);
$bar->advance();
}
});
});
$bar->finish();
$this->newLine();
$this->info('Migration completed!');
}
}
class CleanupOldLogs extends Command
{
public function handle()
{
$deletedCount = 0;
ActivityLog::where('created_at', '<', now()->subMonths(6))
->chunkById(1000, function ($logs) use (&$deletedCount) {
$ids = $logs->pluck('id')->toArray();
// Batch delete for efficiency
ActivityLog::whereIn('id', $ids)->delete();
$deletedCount += count($ids);
$this->info("Deleted {$deletedCount} records so far...");
// Give database a breather
usleep(100000); // 100ms
});
$this->info("Total deleted: {$deletedCount}");
}
}
| Method | Use Case | Memory Usage | Notes |
| ------------- | ------------------------ | ---------------- | ---------------------------------------------- |
| chunk() | General processing | Moderate | May skip/duplicate if modifying filter columns |
| chunkById() | Updates during iteration | Moderate | Safer for modifications |
| lazy() | Large result processing | Low | Returns LazyCollection |
| cursor() | Simple forward iteration | Lowest | Returns Generator |
| each() | Simple operations | High (loads all) | Avoid for large datasets |
User::select('id', 'email', 'name')
->chunkById(100, function ($users) {
// Process with minimal data
});
// Ensure indexed columns in where clauses
User::where('status', 'active') // status should be indexed
->where('created_at', '>', $date) // created_at should be indexed
->chunkById(200, function ($users) {
// Process efficiently
});
User::withoutEvents(function () {
User::chunkById(500, function ($users) {
foreach ($users as $user) {
$user->update(['processed' => true]);
}
});
});
// Instead of updating each record
User::chunkById(100, function ($users) {
$ids = $users->pluck('id')->toArray();
// Bulk update with raw query
DB::table('users')
->whereIn('id', $ids)
->update([
'last_processed_at' => now(),
'processing_count' => DB::raw('processing_count + 1'),
]);
});
class ProcessLargeDataset extends Command
{
public function handle()
{
User::chunkById(100, function ($users) {
ProcessUserBatch::dispatch($users->pluck('id'))
->onQueue('heavy-processing');
});
}
}
class ProcessUserBatch implements ShouldQueue
{
public function __construct(
public Collection $userIds
) {}
public function handle()
{
User::whereIn('id', $this->userIds)
->get()
->each(fn($user) => $user->process());
}
}
test('processes all active users in chunks', function () {
// Create test data
User::factory()->count(150)->create(['active' => true]);
User::factory()->count(50)->create(['active' => false]);
$processed = [];
User::where('active', true)
->chunkById(50, function ($users) use (&$processed) {
foreach ($users as $user) {
$processed[] = $user->id;
}
});
expect($processed)->toHaveCount(150);
expect(count(array_unique($processed)))->toBe(150);
});
test('handles empty datasets gracefully', function () {
$callCount = 0;
User::where('id', '<', 0) // No results
->chunk(100, function ($users) use (&$callCount) {
$callCount++;
});
expect($callCount)->toBe(0);
});
Modifying filter columns during chunk()
// WRONG: May skip records
User::where('processed', false)
->chunk(100, function ($users) {
foreach ($users as $user) {
$user->update(['processed' => true]); // Changes the WHERE condition!
}
});
// CORRECT: Use chunkById()
User::where('processed', false)
->chunkById(100, function ($users) {
foreach ($users as $user) {
$user->update(['processed' => true]);
}
});
Not handling chunk callback returns
// Return false to stop chunking
User::chunk(100, function ($users) {
foreach ($users as $user) {
if ($user->hasIssue()) {
return false; // Stop processing
}
$user->process();
}
});
Ignoring database connection limits
// Consider connection timeouts for long operations
DB::connection()->getPdo()->setAttribute(PDO::ATTR_TIMEOUT, 3600);
User::chunkById(100, function ($users) {
// Long running process
});
Remember: When dealing with large datasets, always think about memory usage, query efficiency, and processing time. Chunk your data appropriately!
development
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
tools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
development
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
tools
Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.