Why I Started This at All
I like my working machines lean. Firefox is my daily cockpit, but I don’t want it to carry a decade of history in active memory. Still, that data – where I’ve been, when, and how often – is gold. It’s not just nostalgia; it’s signal:
- What topics I circle back to.
- Which domains sustain my attention.
- How research unfolds over days or months.
- What “deep work” looks like in browser form.
So I built a small tool inside my own PHP framework, Charm, which uses Laravel’s Eloquent under the hood. The job: ingest Firefox’s places.sqlite (bookmarks, places, visits, origins) into my own database so I can archive, analyze, and search it later – freeing Firefox to purge aggressively and stay snappy.
Back when I started this, my places.sqlite
was small on my fresh Windows 10 workstation. The simplest importer was good enough: open SQLite, iterate rows, firstOrNew()
in Eloquent, set a few properties, save()
. You know the pattern. It’s the kind of pragmatic “nice feature” code you toss into a weekend experiment and forget because, well, it works.
Time passes. The database grows. My workflow centralizes on one machine. Suddenly places.sqlite
has basically everything – and the original importer that once looked harmless is now a boat anchor.
One day I set it running. Four hours later, I killed it. Not because it wouldn’t finish eventually, but because… that’s not acceptable anymore.
The Moment I Decided to Fix It
I blocked out a lazy 1–2 hours on another quiet weekend. I know SQL. I know transactions. I know why row-by-row ORM work falls off a cliff past a certain scale. The issue wasn’t knowledge – it was priorities. This used to be a toy. Now it’s a dataset.
So I did what I always do on these micro-refactors: I stepped back and rewired the mental model.
What’s the job really?
- Two tables change over time in-place (
moz_bookmarks
,moz_origins
). - Two tables are essentially append-only (
moz_places
,moz_historyvisits
). - Firefox timestamps are microseconds since epoch; my DB expects
timestamp
/datetime
. - My target DB has a prefix; my app uses Eloquent inside Charm; the code should remain clean.
- And above all: the database wants set-based work, not a hailstorm of row-level chatter.
I didn’t need to “discover” new truths. I just had to apply the right truths, now that the data justified it.
Reframing the Import
I replaced the row loop with a bulk loader: read in chunks from source SQLite, write in batches to MariaDB, one transaction per table. For append-only tables, I only fetch deltas (WHERE id > max(source_id)
). For mutable tables, I upsert everything. And because Firefox time is in microseconds, I convert it cleanly to UTC timestamps on ingest.
That’s it. Not glamorous. Just correct.
A few compact snippets tell the tale.
Database setup
I replaced PHP’s SQLite and use Eloquent for both databases, also allowing me to use features like upsert:
use Illuminate\Database\Capsule\Manager as Capsule;
use Illuminate\Database\Connection;
$capsule = new Capsule();
// Target: MariaDB/MySQL (your app DB)
$capsule->addConnection([
'driver' => 'mysql',
'host' => '127.0.0.1',
'database' => 'your_db',
'username' => 'user',
'password' => 'pass',
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
'prefix' => 'ab_', // if you use a table prefix
], 'target');
// Source: Firefox places.sqlite
$capsule->addConnection([
'driver' => 'sqlite',
'database' => __DIR__.'/places.sqlite',
'prefix' => '',
], 'places');
$capsule->setAsGlobal();
$capsule->bootEloquent();
$dst = $capsule->getConnection('target');
$src = $capsule->getConnection('places');
$dst->disableQueryLog();
$src->disableQueryLog();
The Tiny Time Converter
function mozUsToTimestamp(?int $us): ?string
{
if (!$us || $us setTimezone(new DateTimeZone('UTC'));
return $dt->format('Y-m-d H:i:s'); // switch to datetime(6) and add .%06d if you need microseconds
}
Upserting a Mutable Table (e.g., moz_origins
)
Inside Charm, I use Eloquent’s query builder directly (no per-row models or events):
$src->table('moz_origins')
->select(['id','prefix','host','frecency'])
->orderBy('id')
->chunk(5000, function ($chunk) use ($dst) {
$now = date('Y-m-d H:i:s');
$payload = [];
foreach ($chunk as $r) {
$payload[] = [
'source_id' => $r->id,
'prefix' => $r->prefix,
'host' => $r->host,
'frecency' => $r->frecency,
'created_at' => $now,
'updated_at' => $now,
];
}
if ($payload) {
// UNIQUE(source_id) on target
$dst->table('browser_origins')->upsert(
$payload,
['source_id'],
['prefix','host','frecency','updated_at']
);
}
});
One statement per chunk. MariaDB does what it’s good at.
Delta Insert for an Append-Only Table (e.g., moz_places
)
$maxId = (int) ($dst->table('browser_places')->max('source_id') ?? 0);
$src->table('moz_places')
->select(['id','url','title','rev_host','visit_count','hidden','typed','frecency','last_visit_date','guid'])
->where('id', '>', $maxId)
->orderBy('id')
->chunk(5000, function ($chunk) use ($dst) {
$now = date('Y-m-d H:i:s');
$rows = [];
foreach ($chunk as $r) {
$rows[] = [
'source_id' => $r->id,
'url' => $r->url,
'title' => $r->title,
'rev_host' => $r->rev_host,
'visit_count' => $r->visit_count,
'hidden' => $r->hidden,
'typed' => $r->typed,
'frecency' => $r->frecency,
'last_visit_date' => mozUsToTimestamp((int)$r->last_visit_date),
'guid' => $r->guid,
'created_at' => $now,
'updated_at' => $now,
];
}
if ($rows) {
// UNIQUE(source_id) on target; duplicates are ignored by the engine
$dst->table('browser_places')->insertOrIgnore($rows);
}
});
And yes, one transaction per table around those blocks. That change alone collapses hours of fsyncs into a quick, durable write.
The Result (and Why It’s Not “Magic”)
I re-ran the importer on the same machine:
- Old script: I had canceled it after four hours.
- New loader: **
Why Loading places.sqlite
Is So Interesting
Once the data lives in my own schema, the browser turns from a transient UI into a measurable knowledge flow.
I can see:
- Temporal attention: not just what I read, but when and how it clusters. Mornings spent on geopolitics; late-night deep dives into a research thread; a burst of docs across a codebase during a refactor.
- Domain gravities: which sites anchor my thinking, which ones are bridges, and which ones are just noise.
- Project signatures: how a product investigation unfolds in the browser before any code is written.
- Search-to-absorption cycles: a query, a click, three tabs later – did that evolve into a “loop” of learning, or end as a dead-end?
There’s ethical nuance here: this is my own browsing, archived privately, with the explicit aim of making my tools better and my mind lighter. Firefox stays slim because I can purge history aggressively; my loader preserves the trail for analysis. That’s the trade I want.
And while I don’t analyze the data deeply yet, I capture it in structured ways. So when I have time for such a project, I can work with clean data and can fully focus on the analysis system on top of a large dataset. There’s also no risk that old history might be gone or must be reconstructed in complicated ways from multiple places.sqlite
files.
About AI’s Role (and What It Wasn’t)
I didn’t “ask AI to make it fast.” That’s not my style. I used ChatGPT as a pair to accelerate the articulation:
- I already knew the architecture – transactions, set-based writes, upsert vs append-only.
- I wanted to pressure-test choices, recall exact Eloquent affordances, and push to a clean interface inside Charm without detouring into Laravel app bootstrapping.
- The collaboration shaved time off: I stayed in flow, validated a few tactical edges (like how I prefer to handle the temporary SQLite connection), and kept the delta-focused mindset crisp.
It’s not intelligence replacement; it’s friction removal. The craft is still ours.
A Few Practicalities I Care About
I’ll spare you a bullet list, but three details matter to me:
- Schema honesty. If a source provides microseconds and I care about causality at fine granularity, I’ll switch to
datetime(6)
and store it. If seconds are enough, I down-convert intentionally. Either way, it’s a choice – visible in code, not an accident. - Predictable reruns. A loader that only works once is a trap. Idempotency, deltas, and unique keys keep the system calm when I press “go” twice.
- Isolation in my stack. Charm provides the ergonomics; Eloquent provides the expressions. The importer doesn’t bleed assumptions into the rest of the app. When I’m done, Firefox can be purged and my stats remain.
Where This Goes Next
Now that the pipeline is instantaneous, the fun part begins:
- Exploratory stats that reflect reality: visit sequences, domain gravities, time-of-day patterns for deep work, and “topic migration” across weeks.
- Search that’s actually mine: cross-joining places, visits, and bookmarks with my notes, code commits, and writing.
- Signals for focus management: if a week skews too hard into reactive browsing, I want to see that as clearly as I’d see failing tests.
Because the data is under my control – and the loader is fast enough to re-run without thinking – I can iterate on questions without babysitting the pipeline.
Closing: The Quiet Satisfaction of Doing It Right
This wasn’t a heroic rewrite. It was the gentle, inevitable correction of a pattern that served its purpose and then outlived it. The first version was fine when the world was small. The new version respects the actual shape of the data and the strengths of the database.
I like that it took an afternoon. I like that it runs in seconds. I like that Firefox can stay nimble while my archives grow smarter. And I like that the solution is boring in the best way: set-based SQL, clear boundaries, and tools that get out of the way.
If your side projects have quietly become real data, it’s worth asking: is the code still telling the truth about the scale? Mine wasn’t. Now it does.
This post was created by myself with support from AI (GPT-5).
Illustrations were generated by myself with Sora. Explore how AI can inspire your content – Neoground GmbH.
Noch keine Kommentare
Kommentar hinzufügen