Estoy tratando de usar NLTK para mi aprendizaje de PNL en Python.
Cierto paquete llamado “panlex_lite” me sigue dando un error, así que intenté usar lo siguiente:
import nltk nltk.download('all', halt_on_error = False)
Y me da el siguiente error:
[nltk_data] | Downloading package panlex_lite to [nltk_data] | /Users/Harshil/nltk_data... [nltk_data] | Unzipping corpora/panlex_lite.zip. Traceback (most recent call last): File "", line 1, in nltk.download('all', halt_on_error = False) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download for msg in self.incr_download(info_or_id, download_dir, force): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 543, in incr_download for msg in self.incr_download(info.children, download_dir, force): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 529, in incr_download for msg in self._download_list(info_or_id, download_dir, force): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 572, in _download_list for msg in self.incr_download(item, download_dir, force): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download for msg in self._download_package(info, download_dir, force): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package for msg in _unzip_iter(filepath, zipdir, verbose=False): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter outfile.write(contents) OSError: [Errno 22] Invalid argument
¿Cualquier forma de arreglar esto? He intentado usar el método “halt_on_error = False” pero todavía me da un error.
Gracias.
Aquí hay un truco “sucio”:
$ rm /Users/Harshil/nltk_data/corpora/panlex_lite.zip $ rm -r /Users/Harshil/nltk_data/corpora/panlex_lite $ python >>> import nltk >>> dler = nltk.downloader.Downloader() >>> dler._update_index() >>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed. >>> dler.download('all')
Además, intente earthy
:
pip install earthy
TL; DR :
import earthy path_to_nltk_data = '/home/yourusername/nltk_data/' earthy.download('all', path_to_nltk_data) # Excludes the third party (non-NLTK) packages.
Para descargar panlex_lite
exclusivamente:
import earthy earthy.download('panlex_lite', path_to_nltk_data)
Para descargar todos los conjuntos de datos de terceros que no nltk_data
alojados de forma nativa en nltk_data
github:
import earthy earthy.download('third_party', path_to_nltk_data')