Skip to content
Snippets Groups Projects
README.md 944 B
Newer Older
Nicolas Denier's avatar
Nicolas Denier committed
# SpeechDatasets.jl
ONDEL Lucas's avatar
ONDEL Lucas committed

A Julia package to download and prepare speech corpus.
ONDEL Lucas's avatar
ONDEL Lucas committed

## Installation
ONDEL Lucas's avatar
ONDEL Lucas committed

Make sure to add the [FAST registry](https://gitlab.lisn.upsaclay.fr/fast/registry)
to your julia installation. Then, install the package as usual:
ONDEL Lucas's avatar
ONDEL Lucas committed
```
pkg> add SpeechDatasets
ONDEL Lucas's avatar
ONDEL Lucas committed
```

ONDEL Lucas's avatar
ONDEL Lucas committed

julia> using SpeechDatasets
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> dataset = MINILIBRISPEECH("outputdir", :train) # :dev | :test
...
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> dataset = TIMIT("/path/to/timit/dir", "outputdir", :train) # :dev | :test
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> dataset = INADIACHRONY("/path/to/ina_wav/dir", "outputdir", "/path/to/ina_csv/dir") # ina_csv dir optional
...

julia> dataset = AVID("/path/to/avid/dir", "outputdir")
...

julia> for ((signal, fs), supervision) in dataset
           # do something
       end
ONDEL Lucas's avatar
ONDEL Lucas committed

# Lexicons
julia> CMUDICT("outputfile")
...
ONDEL Lucas's avatar
ONDEL Lucas committed

julia> TIMITDICT("/path/to/timit/dir")
...
ONDEL Lucas's avatar
ONDEL Lucas committed

ONDEL Lucas's avatar
ONDEL Lucas committed

## License

This software is provided under the CeCILL 2.1 license (see the [`/LICENSE`](/LICENSE))