Skip to content
Snippets Groups Projects
README.md 1.03 KiB
Newer Older
Nicolas Denier's avatar
Nicolas Denier committed
# SpeechDatasets.jl
ONDEL Lucas's avatar
ONDEL Lucas committed

A Julia package to download and prepare speech corpus.
ONDEL Lucas's avatar
ONDEL Lucas committed

## Installation
ONDEL Lucas's avatar
ONDEL Lucas committed

Make sure to add the [PTAL registry](https://gitlab.lisn.upsaclay.fr/PTAL/Registry)
to your julia installation. Then, install the package as usual:
ONDEL Lucas's avatar
ONDEL Lucas committed
```
pkg> add SpeechDatasets
ONDEL Lucas's avatar
ONDEL Lucas committed
```

ONDEL Lucas's avatar
ONDEL Lucas committed

julia> using SpeechDatasets
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> dataset = MINILIBRISPEECH("outputdir", :train) # :dev | :test
...
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> dataset = TIMIT("/path/to/timit/dir", "outputdir", :train) # :dev | :test
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> dataset = INADIACHRONY("/path/to/ina_wav/dir", "outputdir", "/path/to/ina_csv/dir") # ina_csv dir optional
...

julia> dataset = AVID("/path/to/avid/dir", "outputdir")
...

julia> dataset = SPEECH2TEX("/path/to/speech2tex/dir", "outputdir")
...

julia> for ((signal, fs), supervision) in dataset
           # do something
       end
ONDEL Lucas's avatar
ONDEL Lucas committed

# Lexicons
julia> CMUDICT("outputfile")
...
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> TIMITDICT("/path/to/timit/dir")
...
ONDEL Lucas's avatar
ONDEL Lucas committed

ONDEL Lucas's avatar
ONDEL Lucas committed

## License

This software is provided under the [CeCILL-C license](https://cecill.info/licences.en.html) (see [`/license`](/license))