Interactive biorobotics provides unique experimental potential to study the mechanisms underlying social communication but is limited by our ability to build expressive robots that exhibit the complex behaviours of birds and small mammals. An alternative to physical robots is to use virtual environments. Here, we designed and built a modular, audio-visual 2D virtual environment that allows multi-modal, multi-agent interaction to study mechanisms underlying social communication. The strength of the system is an implementation based on event processing that allows for complex computation. We tested this system in songbirds, which provide an exceptionally powerful and tractable model system to study social communication. We show that pair-bonded zebra finches (Taeniopygia guttata) communicating through the virtual environment exhibit normal call timing behaviour, males sing female directed song and both males and females display high-intensity courtship behaviours to their mates. These results suggest that the environment provided is sufficiently natural to elicit these behavioral responses. Furthermore, as an example of complex behavioral annotation, we developed a fully unsupervised song motif detector and used it to manipulate the virtual social environment of male zebra finches based on the number of motifs sung. Our virtual environment represents a first step in real-time automatic behaviour annotation and animal–computer interaction using higher level behaviours such as song. Our unsupervised acoustic analysis eliminates the need for annotated training data thus reducing labour investment and experimenter bias.