Skip to content
Snippets Groups Projects
Commit 64403769 authored by Nicolas Denier's avatar Nicolas Denier
Browse files

alert if the subset kwarg is required

parent 730c952e
No related branches found
No related tags found
1 merge request!6alert if the subset kwarg is required
Pipeline #3146 passed
name = "SpeechDatasets"
uuid = "ae813453-fab8-46d9-ab8f-a64c05464021"
authors = ["Lucas ONDEL YANG <lucas.ondel@cnrs.fr>", "Simon DEVAUCHELLE <simon.devauchelle@universite-paris-saclay.fr>", "Nicolas DENIER <nicolas.denier@lisn.fr>"]
version = "0.17.0"
authors = ["Lucas ONDEL YANG <lucas.ondel@cnrs.fr>", "Simon DEVAUCHELLE <simon.devauchelle@universite-paris-saclay.fr>", "Nicolas DENIER <nicolas.denier@cnrs.fr>"]
version = "0.17.2"
[deps]
SpeechFeatures = "6f3487c4-5ca2-4050-bfeb-2cf56df92307"
......@@ -11,5 +11,5 @@ MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
[compat]
JSON = "0.21"
julia = "1.10"
AudioSources = "0.3.0"
SpeechFeatures = "0.10.4"
......@@ -9,6 +9,14 @@ makedocs(
sitename="SpeechDatasets",
repo = Remotes.GitLab("gitlab.lisn.upsaclay.fr", "PTAL", "Datasets/SpeechDatasets.jl"),
doctest = false,
pages = [
"Home" => "index.md",
"Installation" => "installation.md",
"Examples" => "examples.md",
"API" => "api.md",
"Supported datasets" => "datasets.md",
"Add a new dataset" => "newdataset.md",
]
)
config = GitLabHTTPS()
......
# API
## Load a Dataset
To get data from a supported dataset, you only need one function:
```@docs
dataset(name::AbstractString, inputdir::AbstractString, outputdir::AbstractString)
Base.summary(dataset::SpeechDataset)
get_dataset_kwargs(name::String)
```
## Types
### SpeechDataset
```@docs
SpeechDatasetInfos
SpeechDatasetInfos(name::AbstractString)
SpeechDataset
SpeechDataset(infos::SpeechDatasetInfos, manifestroot::AbstractString, subset::AbstractString)
```
Access a single element with integer or id indexing
```julia
# ds::SpeechDataset
ds[1]
ds["1988-147956-0027"]
```
Access several elements by providing a list
```julia
ds[[1,4,7]]
ds[[8, 2, "777-126732-0015"]]
```
Get all annotations
```julia
ds.annotations
```
### Manifest items
```@docs
SpeechDatasets.ManifestItem
Recording
Annotation
AudioSources.load(r::Recording; start = -1, duration = -1, channels = r.channels)
AudioSources.load(r::Recording, a::Annotation)
SpeechDatasets.load_manifest(T::Type{<:Union{Recording, Annotation}}, path)
```
## Lexicons
```@docs
CMUDICT(path)
TIMITDICT(timitdir)
MFAFRDICT(path)
```
## Index
```@index
```
\ No newline at end of file
<svg version="1.1" width="200" height="200" xmlns="http://www.w3.org/2000/svg">
<ellipse id="petal" cx="52.5" cy="100" rx="42.5" ry="30"
stroke="black" stroke-opacity="0"
fill-opacity="1" fill="#08d87b"/>
<use href="#petal" transform="rotate(45, 100, 100)"/>
<use href="#petal" transform="rotate(90, 100, 100)"/>
<use href="#petal" transform="rotate(135, 100, 100)"/>
<use href="#petal" transform="rotate(180, 100, 100)"/>
<use href="#petal" transform="rotate(225, 100, 100)"/>
<use href="#petal" transform="rotate(270, 100, 100)"/>
<use href="#petal" transform="rotate(315, 100, 100)"/>
</svg>
\ No newline at end of file
<mask id="myMask">
<rect x="0" y="0" width="200" height="200" fill="white" />
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="black" transform="rotate(45,100,100)"/>
</mask>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(0.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(45.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(90.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(135.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(180.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(225.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(270.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(315.0, 100, 100)" mask="url(#myMask)"/>
<circle cx="100" cy="100" r="27.878" stroke="black" stroke-width="2.41" fill="yellow"/>
</svg>
<svg version="1.1" width="200" height="200" xmlns="http://www.w3.org/2000/svg">
<mask id="myMask">
<rect x="0" y="0" width="200" height="200" fill="white" />
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="black" transform="rotate(45,100,100)"/>
</mask>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(0.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(45.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(90.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(135.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(180.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(225.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(270.0, 100, 100)"/>
<ellipse cx="52.5" cy="100" rx="42.5" ry="25" stroke="black" stroke-width="2.41" fill="white" transform="rotate(315.0, 100, 100)" mask="url(#myMask)"/>
<circle cx="100" cy="100" r="27.878" stroke="black" stroke-width="2.41" fill="yellow"/>
</svg>
......@@ -41,13 +41,18 @@ function write_corpora_docs(io::IO)
println(io, join(corpus["authors"], ", "))
end
need_subset = false
if "subsets" in fields
need_subset = true
println(io, "")
println(io, "### Subsets")
println(io, join(corpus["subsets"], ", "))
end
kwargs = get_dataset_kwargs(corpus["name"])
if need_subset
kwargs = merge(kwargs, (;subset=""))
end
if ! isempty(kwargs)
println(io, "### Keyword arguments")
println(io, "```julia")
......
# Examples
```julia
using SpeechDatasets
ds = dataset("Mini LibriSpeech", "path/to/minils", "minils_output")
typeof(ds[26])
```
```@example
println("Tuple{Recording, Annotation}") # hide
```
\ No newline at end of file
# SpeechDatasets.jl
Convenient and unified way to load a speech dataset. It can then be harnessed with other PTAL tools.
A `SpeechDataset` instance consists of a set of recordings (info about audio data) and annotations.
## Contents
```@contents
Depth = 3
```
## Example
```julia
using SpeechDatasets
ds = dataset("Mini LibriSpeech", "path/to/minils", "minils_output")
typeof(ds[26])
Pages = ["index.md", "installation.md", "examples.md", "api.md", "datasets.md", "newdataset.md"]
```
```@example
println("Tuple{Recording, Annotation}") # hide
```
## Load a Dataset
```@docs
dataset(name::AbstractString, inputdir::AbstractString, outputdir::AbstractString)
Base.summary(dataset::SpeechDataset)
get_dataset_kwargs(name::String)
```
## License
This software is provided under the [CeCILL-C license](https://cecill.info/licences.en.html)
## Types
### SpeechDataset
```@docs
SpeechDatasetInfos
SpeechDatasetInfos(name::AbstractString)
SpeechDataset
SpeechDataset(infos::SpeechDatasetInfos, manifestroot::AbstractString, subset::AbstractString)
```
Access a single element with integer or id indexing
```julia
# ds::SpeechDataset
ds[1]
ds["1988-147956-0027"]
```
Access several elements by providing a list
```julia
ds[[1,4,7]]
ds[[8, 2, "777-126732-0015"]]
```
Get all annotations
```julia
ds.annotations
```
### Manifest items
```@docs
SpeechDatasets.ManifestItem
Recording
Annotation
AudioSources.load(r::Recording; start = -1, duration = -1, channels = r.channels)
AudioSources.load(r::Recording, a::Annotation)
SpeechDatasets.load_manifest(T::Type{<:Union{Recording, Annotation}}, path)
```
## Lexicons
```@docs
CMUDICT(path)
TIMITDICT(timitdir)
MFAFRDICT(path)
```
## Authors
## Index
- Lucas Ondel Yang
- Nicolas Denier
- Simon Devauchelle
```@index
```
\ No newline at end of file
![](https://ptal.lisn.upsaclay.fr/assets/lisn-ups-cnrs.png)
\ No newline at end of file
......@@ -194,6 +194,9 @@ function dataset(builder::DatasetBuilder, name::AbstractString, inputdir::Abstra
infos = SpeechDatasetInfos(name)
# Check subset value
if ! isempty(infos.subsets) && isempty(subset)
throw(ArgumentError("The subset argument is required for this dataset, try one of $(infos.subsets)."))
end
subset [infos.subsets ; ""] || throw(ArgumentError("Subset $subset is not supported, try one of $(infos.subsets)."))
# Check lang value if provided
......
  • 🤖 CI Bot @project_1247_bot_1b5f29a72d826746f0de20d4c092a6ca

    mentioned in commit 6966f66f

    ·

    mentioned in commit 6966f66f

    Toggle commit list
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment