An open-source archive preserving live music history before it disappears. Starting with the Netherlands, built to scale everywhere. 107K+ events. 210+ venues. Growing daily.
Every year, venues close and their websites go offline. Programming archives vanish. Decades of cultural history are lost with no recovery. No single source preserves the full picture of live music anywhere.
When a venue shuts its doors, its website follows. Years of programming history, who played, when, in what combination, gone overnight.
Even active venues routinely purge old event listings. Last season's concerts are already unreachable. Last decade's are unrecoverable.
Concert data is scattered across ticketing platforms, social media, and venue sites, each with its own format, lifespan, and limitations.
GROUPIES scrapes, normalizes, enriches, and preserves concert data automatically, every day. The pipeline is country-agnostic; it starts with the Netherlands because that's where the data is richest.
Daily automated collection from 210+ venue websites and aggregator platforms.
Structured extraction of dates, artists, venues, genres regardless of source format.
Artist matching via MusicBrainz and local LLMs. Genre classification, deduplication, linking.
Permanent structured archive. Every event stored, searchable, and available as open data.
The pipeline runs on standard hosting infrastructure no cloud dependencies, no vendor lock-in.
The GROUPIES archive is one of the most comprehensive structured concert datasets in the Netherlands and the approach is designed to work for any country.
Concert records dating back to 1955, sourced from venue archives and aggregator platforms.
From major concert halls to independent clubs across the Netherlands. Each with its own dedicated scraper.
Artists matched and deduplicated across sources using MusicBrainz identifiers and AI-assisted matching.
The archive reaches back over seven decades, capturing the evolution of live music culture and growing deeper every day.
GROUPIES is built for anyone who cares about live music data whether you want to build on it or learn from it.
The archive is live and growing daily. Here's what comes next.
Live — collecting from 210+ venues every day
Live — MusicBrainz matching and AI-assisted classification
Browse and search the full archive by artist, venue, date, or genre
RESTful API for developers to query and build on the dataset
Submit corrections, add venues, contribute scrapers
Collaborate with cultural institutions, libraries, and archives
Whether you're a funder, a venue, a developer, or just someone who thinks concert history matters, we'd like to hear from you.