Interactional Variation Online: harnessing emerging technologies in the digital humanities to analyse online discourse in different workplace contexts
The IVO corpus is a collection of approx. 170,000 transcribed words of recorded virtual meetings held between July 2021 and July 2022, itemised in the 'IVO_corpus_file details' file. Recordings vary in length, number of participants, and meeting type.
The ‘IVO core meetings corpus’ comprises meetings 1-15. They include 15 recordings from four different institutional contexts, ranging from municipal council meetings (DCC), a non-governmental organisation promoting arts (NCoL), an academic conference organising committee (TaLC) and a state-of-the-art software development company (GitLab). Some of these meetings are hybrid (i.e. some participants are in the same location). The meetings are agenda-driven and can be defined as workplace interaction. There are four remaining meetings (16-19) which are more representative of interviews, training sessions or presentations than meetings and so are not included in the IVO core meetings corpus.
The IVO project was co-led by Anne O’Keeffe (anne.keeffe@mic.ul.ie) at Mary Immaculate College (MIC), Limerick and Dawn Knight (KnightD5@cardiff.ac.uk), at the Centre for Language and Communication Research, Cardiff University. The full project team comprised: 2 Principal Investigators (PI – Anne O’Keeffe, Dawn Knight), 2 Co-Investigators (CIs – Svenja Adolphs, Benjamin Cowan, Tania Fahey-Palma, Fiona Farr, Sandrine Peraldi), 1 Postdoctoral Researcher and 2 Research Associates over the course of the project. In addition, there were 9 academic advisors https://ivohub.com/gallery/. The project was co-funded by AHRC and IRC.
This data in this corpus has been anonymised using a combination of manual and automated techniques. In addition to transcriptions of speech, the IVO core meetings corpus is tagged for selected nonverbal features. These include annotations for backchannels (head nods and spoken) in the first and last five minutes, emblematic gestures and meaningful gestures for each visible participant - saved as .eaf files (which can be opened in ELAN - see: https://archive.mpi.nl/tla/elan). The extent to which each recording was annotated for these features is detailed in the IVO_corpus_file_details (i.e. this varies from one file to the next). Where more than one feature was annotated, these were assembled into a single combined .eaf file. All .eaf files of the IVO core meetings corpus can be opened/reused in ELAN.
The following files are included in this dataset:
- IVO_corpus_file_details: contains all information about the corpus file recordings (i.e. the meeting sessions captured within)
- Transcription conventions: guide to the conventions used in the corpus transcripts
- Youtube Links to GitLab Videos: links to the source files that were transcribed in the corpus (users can locate these files and recreate the full multimodal corpus)
- ivo_meetings: containing all of the transcripts (without timestamps) from the entire corpus in a single file (.txt). This can be uploaded to a digital concordancing tool for further exploration (e.g. Sketch Engine)
- ivo_meetingscore: containing all of the transcripts (without timestamps) from the sample, 'core, corpus in a single file (.txt). This can be uploaded to a digital concordancing tool for further exploration (e.g. Sketch Engine)
- DCC1_emblems.eaf: DCC1 file annotated in ELAN - contains annotations for emblems only
- DCC2_combined.eaf: DCC3 file annotated in ELAN - contains annotations for backchannels at the start and end of the video (5 minutes each) and emblems
- DCC3_combined.eaf: DCC3 file annotated in ELAN - contains annotations for meaningful gestures and emblems
- DCC4_combined.eaf: DCC4 file annotated in ELAN - contains annotations for emblems only
- Git1_combined.eaf: Git1 file annotated in ELAN - contains annotations for meaningful gestures and emblems
- Git2_combined.eaf: Git2 file annotated in ELAN - contains annotations for meaningful gestures and emblems
- Git3_emblems.eaf: Git3 file annotated in ELAN - contains annotations for emblems only
- Git4_emblems.eaf: Git4 file annotated in ELAN - contains annotations for emblems only
- NCoL1_combined.eaf: NCoL1 file annotated in ELAN - contains annotations for meaningful gestures, emblems and backchannels at the start and end of the video (5 minutes each)
- NCoL2_emblems.eaf: NCoL2 file annotated in ELAN - contains annotations for emblems only
- NCoL2_emblems.eaf: NCoL2 file annotated in ELAN - contains annotations for emblems only
- NCoL4_combined.eaf: NCoL4 file annotated in ELAN - contains annotations for meaningful gestures, emblems and backchannels at the start and end of the video (5 minutes each)
- TaLC1_combined.eaf: TaLC1 file annotated in ELAN - contains annotations for meaningful gestures, emblems and backchannels at the start and end of the video (5 minutes each)
- TaLC2_emblems.eaf: TaLC2 file annotated in ELAN - contains annotations for emblems only
- TaLC3_emblems.eaf: TaLC3 file annotated in ELAN - contains annotations for emblems only
Funding
AH/W001608/1
IRC/W001608/1
History
Specialist software required to view data files
ELAN, RLanguage(s) in dataset
- English-Great Britain (EN-GB)