API

Load a Dataset

To get data from a supported dataset, you only need one function:

SpeechDatasets.datasetFunction
dataset(dataset, inputdir::AbstractString, outputdir::AbstractString; kwargs...)

Create a SpeechDataset object for dataset. inputdir is the directory containing the raw data. If the inputdir does not exist and the data is freely available, it will be automatically downloaded and put in inputdir. outputdir is the directory where will be stored summary files. kwargs... are dataset specific arguments passed to dataset

source

Types

SpeechDataset

Access a single element with integer or id indexing

# ds::SpeechDataset
ds[1]
ds["1988-147956-0027"]

Manifest items

SpeechDatasets.RecordingType
struct Recording{Ts<:AbstractAudioSource} <: ManifestItem
    id::AbstractString
    source::Ts
    channels::Vector{Int}
    samplerate::Int
end

A recording is an audio source associated with and id.

Constructors

Recording(id, source, channels, samplerate)
Recording(id, source[; channels = missing, samplerate = missing])

If the channels or the sample rate are not provided then they will be read from source.

Warning

When preparing large corpus, not providing the channels and/or the sample rate can drastically reduce the speed as it forces to read source.

source
SpeechDatasets.AnnotationType
struct Annotation <: ManifestItem
    id::AbstractString
    recording_id::AbstractString
    start::Float64
    duration::Float64
    channel::Union{Vector, Colon}
    data::Dict
end

An "annotation" defines a segment of a recording on a single channel. The data field is an arbitrary dictionary holdin the nature of the annotation. start and duration (in seconds) defines, where the segment is locatated within the recoding recording_id.

Constructor

Annotation(id, recording_id, start, duration, channel, data)
Annotation(id, recording_id[; channel = missing, start = -1, duration = -1, data = missing)

If start and/or duration are negative, the segment is considered to be the whole sequence length of the recording.

source
AudioSources.loadMethod
load(recording::Recording [; start = -1, duration = -1, channels = recording.channels])
load(recording, annotation)

Load the signal from a recording. start, duration (in seconds)

The function returns a tuple (x, sr) where x is a $N×C$ array

  • $N$ is the length of the signal and $C$ is the number of channels
  • and sr is the sampling rate of the signal.
source
AudioSources.loadMethod
load(r::Recording, a::Annotation)
load(t::Tuple{Recording, Annotation})

Load only a segment of the recording referenced in the annotation.

source

Lexicons

SpeechDatasets.CMUDICTMethod
CMUDICT(path)

Return the dictionary of pronunciation loaded from the CMU sphinx dictionary. The CMU dictionary will be donwloaded and stored into to path. Subsequent calls will only read the file path without downloading again the data.

source

Index