Cardiff University
Browse
FOLDER
IVO Corpus eaf_files
DOCUMENT
Transcription conventions.docx (35.97 kB)
DOCUMENT
Youtube Links to GitLab Videos.docx (20.54 kB)
DOCUMENT
IVO_corpus_file details.docx (20.23 kB)
TEXT
ivo_meetings.txt (985.9 kB)
TEXT
ivo_meetingscore.txt (801.54 kB)
1/0
20 files

Interactional Variation Online: harnessing emerging technologies in the digital humanities to analyse online discourse in different workplace contexts

dataset
posted on 2024-09-05, 13:33 authored by Dawn KnightDawn Knight, Anne O’Keeffe, Christopher Fitzgerald, Justin McNamara, Geraldine MarkGeraldine Mark, Sandrine Peraldi, Tania Fahey Palma, Fiona Farr, Benjamin Cowan, Svenja Adolphs

The IVO corpus is a collection of approx. 170,000 transcribed words of recorded virtual meetings held between July 2021 and July 2022, itemised in the 'IVO_corpus_file details' file. Recordings vary in length, number of participants, and meeting type.

The ‘IVO core meetings corpus’ comprises meetings 1-15. They include 15 recordings from four different institutional contexts, ranging from municipal council meetings (DCC), a non-governmental organisation promoting arts (NCoL), an academic conference organising committee (TaLC) and a state-of-the-art software development company (GitLab). Some of these meetings are hybrid (i.e. some participants are in the same location). The meetings are agenda-driven and can be defined as workplace interaction. There are four remaining meetings (16-19) which are more representative of interviews, training sessions or presentations than meetings and so are not included in the IVO core meetings corpus.

The IVO project was co-led by Anne O’Keeffe (anne.keeffe@mic.ul.ie) at Mary Immaculate College (MIC), Limerick and Dawn Knight (KnightD5@cardiff.ac.uk), at the Centre for Language and Communication Research, Cardiff University. The full project team comprised: 2 Principal Investigators (PI – Anne O’Keeffe, Dawn Knight), 2 Co-Investigators (CIs – Svenja Adolphs, Benjamin Cowan, Tania Fahey-Palma, Fiona Farr, Sandrine Peraldi), 1 Postdoctoral Researcher and 2 Research Associates over the course of the project. In addition, there were 9 academic advisors https://ivohub.com/gallery/. The project was co-funded by AHRC and IRC.

This data in this corpus has been anonymised using a combination of manual and automated techniques. In addition to transcriptions of speech, the IVO core meetings corpus is tagged for selected nonverbal features. These include annotations for backchannels (head nods and spoken) in the first and last five minutes, emblematic gestures and meaningful gestures for each visible participant - saved as .eaf files (which can be opened in ELAN - see: https://archive.mpi.nl/tla/elan). The extent to which each recording was annotated for these features is detailed in the IVO_corpus_file_details (i.e. this varies from one file to the next). Where more than one feature was annotated, these were assembled into a single combined .eaf file. All .eaf files of the IVO core meetings corpus can be opened/reused in ELAN.

The following files are included in this dataset:

  • IVO_corpus_file_details: contains all information about the corpus file recordings (i.e. the meeting sessions captured within)
  • Transcription conventions: guide to the conventions used in the corpus transcripts
  • Youtube Links to GitLab Videos: links to the source files that were transcribed in the corpus (users can locate these files and recreate the full multimodal corpus)
  • ivo_meetings: containing all of the transcripts (without timestamps) from the entire corpus in a single file (.txt). This can be uploaded to a digital concordancing tool for further exploration (e.g. Sketch Engine)
  • ivo_meetingscore: containing all of the transcripts (without timestamps) from the sample, 'core, corpus in a single file (.txt). This can be uploaded to a digital concordancing tool for further exploration (e.g. Sketch Engine)
  • DCC1_emblems.eaf: DCC1 file annotated in ELAN - contains annotations for emblems only
  • DCC2_combined.eaf: DCC3 file annotated in ELAN - contains annotations for backchannels at the start and end of the video (5 minutes each) and emblems
  • DCC3_combined.eaf: DCC3 file annotated in ELAN - contains annotations for meaningful gestures and emblems
  • DCC4_combined.eaf: DCC4 file annotated in ELAN - contains annotations for emblems only
  • Git1_combined.eaf: Git1 file annotated in ELAN - contains annotations for meaningful gestures and emblems
  • Git2_combined.eaf: Git2 file annotated in ELAN - contains annotations for meaningful gestures and emblems
  • Git3_emblems.eaf: Git3 file annotated in ELAN - contains annotations for emblems only
  • Git4_emblems.eaf: Git4 file annotated in ELAN - contains annotations for emblems only
  • NCoL1_combined.eaf: NCoL1 file annotated in ELAN - contains annotations for meaningful gestures, emblems and backchannels at the start and end of the video (5 minutes each)
  • NCoL2_emblems.eaf: NCoL2 file annotated in ELAN - contains annotations for emblems only
  • NCoL2_emblems.eaf: NCoL2 file annotated in ELAN - contains annotations for emblems only
  • NCoL4_combined.eaf: NCoL4 file annotated in ELAN - contains annotations for meaningful gestures, emblems and backchannels at the start and end of the video (5 minutes each)
  • TaLC1_combined.eaf: TaLC1 file annotated in ELAN - contains annotations for meaningful gestures, emblems and backchannels at the start and end of the video (5 minutes each)
  • TaLC2_emblems.eaf: TaLC2 file annotated in ELAN - contains annotations for emblems only
  • TaLC3_emblems.eaf: TaLC3 file annotated in ELAN - contains annotations for emblems only

Funding

AH/W001608/1

IRC/W001608/1

History

Specialist software required to view data files

ELAN, R

Language(s) in dataset

  • English-Great Britain (EN-GB)

Data-collection start date

2021-08-01

Data-collection end date

2024-06-30

Usage metrics

    School of English, Communication and Philosophy

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC