API
Load a Dataset
To get data from a supported dataset, you only need one function:
SpeechDatasets.dataset
— Functiondataset(dataset, inputdir::AbstractString, outputdir::AbstractString; kwargs...)
Create a SpeechDataset
object for dataset
. inputdir
is the directory containing the raw data. If the inputdir
does not exist and the data is freely available, it will be automatically downloaded and put in inputdir
. outputdir
is the directory where will be stored summary files. kwargs
... are dataset specific arguments passed to dataset
Types
SpeechDataset
SpeechDatasets.SpeechDataset
— TypeSpeechDataset
Store metadata about a speech dataset.
Access a single element with integer or id indexing
# ds::SpeechDataset
ds[1]
ds["1988-147956-0027"]
Manifest items
SpeechDatasets.ManifestItem
— Typeabstract type ManifestItem end
Base class for all manifest item. Every manifest item should have an id
attribute.
SpeechDatasets.Recording
— Typestruct Recording{Ts<:AbstractAudioSource} <: ManifestItem
id::AbstractString
source::Ts
channels::Vector{Int}
samplerate::Int
end
A recording is an audio source associated with and id.
Constructors
Recording(id, source, channels, samplerate)
Recording(id, source[; channels = missing, samplerate = missing])
If the channels or the sample rate are not provided then they will be read from source
.
When preparing large corpus, not providing the channels and/or the sample rate can drastically reduce the speed as it forces to read source.
SpeechDatasets.Annotation
— Typestruct Annotation <: ManifestItem
id::AbstractString
recording_id::AbstractString
start::Float64
duration::Float64
channel::Union{Vector, Colon}
data::Dict
end
An "annotation" defines a segment of a recording on a single channel. The data
field is an arbitrary dictionary holdin the nature of the annotation. start
and duration
(in seconds) defines, where the segment is locatated within the recoding recording_id
.
Constructor
Annotation(id, recording_id, start, duration, channel, data)
Annotation(id, recording_id[; channel = missing, start = -1, duration = -1, data = missing)
If start
and/or duration
are negative, the segment is considered to be the whole sequence length of the recording.
AudioSources.load
— Methodload(recording::Recording [; start = -1, duration = -1, channels = recording.channels])
load(recording, annotation)
Load the signal from a recording. start
, duration
(in seconds)
The function returns a tuple (x, sr)
where x
is a $N×C$ array
- $N$ is the length of the signal and $C$ is the number of channels
- and
sr
is the sampling rate of the signal.
AudioSources.load
— Methodload(r::Recording, a::Annotation)
load(t::Tuple{Recording, Annotation})
Load only a segment of the recording referenced in the annotation.
SpeechDatasets.load_manifest
— Methodload_manifest(Annotation, path)
load_manifest(Recording, path)
Load Recording/Annotation manifest from path
.
Lexicons
SpeechDatasets.CMUDICT
— MethodCMUDICT(path)
Return the dictionary of pronunciation loaded from the CMU sphinx dictionary. The CMU dictionary will be donwloaded and stored into to path
. Subsequent calls will only read the file path
without downloading again the data.
SpeechDatasets.TIMITDICT
— MethodTIMITDICT(timitdir)
Return the dictionary of pronunciation as provided by TIMIT corpus (located in timitdir
).
SpeechDatasets.MFAFRDICT
— MethodMFAFRDICT(path)
Return the french dictionary of pronunciation as provided by MFA (french_mfa v2.0.0a).