bash-tutorial/tutorial/wgetandmore.org

1.8 KiB

Getting files from internet

There are multiple options to do that. First, create a temporary directory to train, make mistakes and learn.

  • Use 'mkdir' in bash to do that
  mkdir scratch

You can use 'wget' to download files.

Let's suppose that you want to download a pdf file from this website. Use wget and include the link as arument in the terminal:

  wget https://www.locus.ufv.br/bitstream/123456789/10320/1/texto%20completo.pdf

You can download multiple files. First, you should include the url of each file in a plain text file. Then use wget with the -i argument:

  echo https://zenodo.org/record/275433/files/SS2SmallScaleDairyExport20150605.xml?download=1 > FilesToDownload.txt
  echo https://zenodo.org/record/3962046/files/mountain_pastured_cows.csv?download=1 >> FilesToDownload.txt
  wget -i FilesToDownload.txt

You can also use 'pandoc'

Use the '-o' argument to rename the file if you want:

pandoc https://itsfoss.com/download-files-from-linux-terminal/ -o tutorial.org

If you are a "GNU-emacs" person, then use 'eww' to browse the web

So you can find the websites, copy the url and downloand files

Within emacs use

M-x eww

Then browse the web.

Within the website you can use

M-x eww-copy-page-url

There is anothe great tool to download files: 'curl'

Try this also and learn a litte about it.

Use R

R has many options to get data from multiple sources.

Check, for example, the function 'fread' from 'data.table' package

References:

  • Check the manual for 'wget', pandoc and 'curl'
 man wget
 man pandoc
 man curl