Open-source cultural infrastructure

The open record of live music

An open-source archive preserving live music history before it disappears. Starting with the Netherlands, built to scale everywhere. 107K+ events. 210+ venues. Growing daily.

107,193
Historical events
10,751
Future events
210
Venues
52,435
Performers

The problem

Concert history is disappearing

Every year, venues close and their websites go offline. Programming archives vanish. Decades of cultural history are lost with no recovery. No single source preserves the full picture of live music anywhere.

Venues close

When a venue shuts its doors, its website follows. Years of programming history, who played, when, in what combination, gone overnight.

Websites expire

Even active venues routinely purge old event listings. Last season's concerts are already unreachable. Last decade's are unrecoverable.

No central record

Concert data is scattered across ticketing platforms, social media, and venue sites, each with its own format, lifespan, and limitations.


What we're building

An open-source data pipeline for live music

GROUPIES scrapes, normalizes, enriches, and preserves concert data automatically, every day. The pipeline is country-agnostic; it starts with the Netherlands because that's where the data is richest.

Step 1

Scrape

Daily automated collection from 210+ venue websites and aggregator platforms.

Step 2

Normalize

Structured extraction of dates, artists, venues, genres regardless of source format.

Step 3

Enrich

Artist matching via MusicBrainz and local LLMs. Genre classification, deduplication, linking.

Step 4

Preserve

Permanent structured archive. Every event stored, searchable, and available as open data.

The pipeline runs on standard hosting infrastructure no cloud dependencies, no vendor lock-in.


The data

Substantial, structured, and growing

The GROUPIES archive is one of the most comprehensive structured concert datasets in the Netherlands and the approach is designed to work for any country.

107K+

Historical events

Concert records dating back to 1955, sourced from venue archives and aggregator platforms.

210+

Venues tracked

From major concert halls to independent clubs across the Netherlands. Each with its own dedicated scraper.

52K+

Unique performers

Artists matched and deduplicated across sources using MusicBrainz identifiers and AI-assisted matching.

1955

Earliest records

The archive reaches back over seven decades, capturing the evolution of live music culture and growing deeper every day.


Who it's for

Open data, open possibilities

GROUPIES is built for anyone who cares about live music data whether you want to build on it or learn from it.

For developers

  • Structured data feeds in standard formats
  • Public API on the roadmap
  • Tech stack: PHP, SQLite, MusicBrainz, local LLMs
  • Open source, contribute scrapers, improve matching
  • Build concert discovery tools, data visualizations, research projects

For the music industry

  • Historical programming data for venues and festivals
  • Audience and booking pattern insights
  • Cultural preservation, document your venue's legacy
  • Cross-venue artist routing and touring data
  • Evidence base for cultural funding applications

Roadmap

Where we're headed

The archive is live and growing daily. Here's what comes next.

Daily automated scraping

Live — collecting from 210+ venues every day

Artist enrichment pipeline

Live — MusicBrainz matching and AI-assisted classification

Searchable public database

Browse and search the full archive by artist, venue, date, or genre

Open API

RESTful API for developers to query and build on the dataset

Community contributions

Submit corrections, add venues, contribute scrapers

Preservation partnerships

Collaborate with cultural institutions, libraries, and archives


Get involved

Let's preserve this together

Whether you're a funder, a venue, a developer, or just someone who thinks concert history matters, we'd like to hear from you.