Public Datasets Initial release

We are happy to announce the initial release of our PLAICraft dataset. Access the data here or on the sidebar of our blog.

This is a novel dataset that captures multiplayer Minecraft interactions across five time-aligned modalities: video, game output, audio, microphone input audio, mouse and keyboard actions. This enables the study of embodied behaviour in a rich, sandbox world like Minecraft.

The initial release comprises over 200 hours from 3 different anonymized players – Dante, Dana and Morgan. Below are a few examples of our rich, multimodal dataset from interactive player sessions.

Xander building house
Xander house built
Bedwar fighting
Xander tutoring
Xander exploring with friends
Morgan meeting strangers
Jungle exploration
Desert temple exploration
Team meeting and planning
Running away from mobs
Planning to make obsidian
Transporting villager to iron farm
Building iron farm
Building iron farm 2
Building iron farm 3
Building farm

We plan to make our full 10,000 hour dataset, comprising anonymized player data from over 10,000 players from around the world, available in the future. We’re really excited to continue training and testing embodied AI agents on this dataset and aim to unveil them when they are ready. As always, thank you to everyone participating in our research and we encourage you to check back for more updates on PLAICraft!

Leave a comment

Download data

Access our public datasets here.

What is plaicraft?

PLAICraft is a research project run by the Pacific Laboratory for Artificial Intelligence (PLAI), a cutting-edge research group based in the Department of Computer Science at the University of British Columbia.