Download YouTube videos quickly in countries with slow international links

My local ISP recently installed fibre in town, which freed us up from the horror that is 700kbit WiMAX connections. The sales rep came round and enthusiastically encouraged us to upgrade to an "up to 100mbit" plan, which turned out to be shared with the entire town.


So in practice we get about 1mbit for international traffic, though national traffic is pretty fast at 8-25mbit. Google and Akamai have servers in Madagascar so Google services are super fast, Facebook works great and Windows updates come through fairly quickly, but everything else sorta plods along.

Spotify, Netflix and basically anything streaming are out, but YouTube works perfectly, even in HD, as long as you immediately refresh the page after the video first starts playing. It seems that the first time someone loads a video, it immediately gets cached in-country over what I can only assume is a super-secret super-fast Google link. The second time, it loads much quicker.

Inline images 2

First load

Inline images 1

Second load

This is great in the office, but if you want to load up some videos to take home (internet is way too expensive to have at home) you're going to want to download them. I'm a big fan of youtube-dl, which runs on most OSs and lets you pick and choose your formats. You can start it going, immediately cancel and restart to download at full speed, but you have to do it separately for video and audio and it's generally pretty irritating. So here's a bit of bash script to do it for you!

First install youtube-dl and expect if you don't have them already:

sudo apt-get install youtube-dl expect

Then add something like this to your ~/.bashrc:

expect -c 'spawn youtube-dl -f "bestvideo\[height<=480\]/best\[height<=480\]" -o /home/user/YouTube/%(title)s.f%(format_id)s.%(ext)s --no-playlist --no-mtime '"$1"'; expect " ETA " { close }'
expect -c 'spawn youtube-dl -f "worstaudio" -o /home/user/YouTube/%(title)s.f%(format_id)s.%(ext)s --no-playlist --no-mtime '"$1"'; expect " ETA " { close }'
youtube-dl -f "bestvideo[height<=480]+worstaudio/best[height<=480]" -o "/home/user/YouTube/%(title)s.%(ext)s" --no-playlist --no-mtime $1

Run bash to reload and use it like yt

The first two expect commands start downloading the video and audio respectively (I limit mine to 480p or below video and the smallest possible audio, but feel free to change it), killing youtube-dl as soon as they see " ETA " which appears once downloads start. The third command downloads the whole thing once it's been cached in-country.

The reason we include the format ID in the filename for the first two commands is because when downloading video and audio together, youtube-dl adds the format code to the temporary files as title.fcode.ext. When downloading just video or just audio, these aren't included by default. By adding these ourselves, the third command will resume downloading from the existing files and remove them automatically after combining them into one file.

I like to include --no-mtime so the downloaded files' modification date is when they were downloaded, rather than when the video was uploaded. This means I can easily delete them after a month with a crontab entry:

0 21 * * Sun root find /home/user/YouTube/ -type f -mtime +31 -print -delete

Ignore the running as root bit, it's on a NAS so everything runs as root. Woo.

Bash one-liner: Add an Apache directory index to an aria2 download queue

I work in a country with terrible internet, so large downloads through browsers often break part way through. The solution is aria2, a command-line download utility with an optional web UI to queue up downloads. This runs on a server (i.e. a laptop on a shelf) with a few extra config options to make it handle dodgy electricity and dodgy connections a bit better.

A simple crontab entry starts it on boot:

@reboot screen -dmS aria2 aria2c --conf-path=/home/user/.aria2/aria2.conf

The config file /home/user/.aria2/aria2.conf adds some default options:


The three RPC options allows the web UI to connect (port 6800 by default), and the session file allows the download queue to persist across reboots (again, dodgy electricity).

Most downloads work fine, but others expire after a certain time, don't allow resuming or only allow a single HTTP request. For these I use a server on a fast connection that acts as a middleman - I can download files immediately there and bring them in later on the slow connection. This is easy enough for single files with directory indexes set up in Apache - right click, copy URL, paste into web UI, download. For entire folders it's a bit more effort to copy every URL, so here's a quick and dirty one-liner you can add to your .bashrc that will accept a URL to an Apache directory index and add every file listed to the aria2 queue.

wget --spider -r --no-parent --level=1 --reject index.html* -nd -e robots=off --reject-regex '(.*)\?(.*)' --user=apache_user --password=apache_password $1 2>&1 | grep '^--' | awk '{ print $3 }' | sed "s/'/%27/" | sed -e '1,2d' | sed '$!N; /^\(.*\)\n\1$/!P; D' | sed 's#^#http://aria2_url:6800/jsonrpc -H "Content-Type: application/json" -H "Accept: application/json" --data \x27{"jsonrpc": "2.0","id":1,"method": "aria2.addUri", "params":["token:secret_token", ["#' | sed 's#$#"], {"pause":"true", "http-user":"apache_user", "http-passwd":"apache_password"}]}\x27#' | xargs -L 1 curl

Add the above to your .bashrc and run bash to reload. Then, to add a directory:


By default this will add downloads paused - see below for more info.

The code is a bit of a mouthful, so here's what each bit does:

wget --spider -r --no-parent --level=1 --reject index.html* -nd -e robots=off --reject-regex '(.*)\?(.*)' --user=apache_user --password=apache_password $1 2>&1

--spider: Don't download anything, just check the page is there (this is later used to provide a list of links to download)
-r --no-parent --level=1: Retrieve recursively, so check all the links on the page, but don't download the parent directory and don't go any deeper than the current directory
--reject index.html*: Ignore the current page
-nd: Don't create a directory structure for downloaded files. wget needs to download at least the index page to check for links, but by default will create a directory structure like in the current folder. The --spider option deletes these files after they're created, but doesn't delete directories, leaving you with a bunch of useless empty folders. In theory you could instead output to a single temporary file with -O tmpfile, but for some reason this stops wget from parsing for further links.
-e robots=off: Ignore robots.txt in case it exists
--reject-regex '(.*)\?(.*)': ignore any link with a query string - this covers the ones which sort the listing by name, date, size or description
--user=apache_user --password=apache_password: if you're using Basic Authentication to secure the directory listing
$1: feeds in the URL from the shell
2>&1: wget writes to stderr by default, so we redirect all output to stdout

grep '^--' | awk '{ print $3 }' | sed "s/'/% 27/" | sed -e '1,2d' | sed '$!N; /^\(.*\)\n\1$/!P; D'

grep '^--': lines containing URLs begin with the date enclosed in two hyphens (e.g. --2017-08-23 12:37:28--), so we match only lines which begin with two hyphens
awk '{ print $3 }': separates each line into columns separated by spaces, and outputs only the third one (e.g. --2017-08-23 12:37:28--
sed "s/'/%27/": Apache doesn't urlencode single quote marks in URLs but the script struggles with them, so we convert them to their URL encoded equivalent
sed -e '1,2d': the first two URLs wget outputs is always the directory itself, so we remove the first two lines
sed '$!N; /^\(.*\)\n\1$/!P; D': occasionally you get consecutive duplicate lines coming out, so this removes them. You could use uniq. But this looks more impressive.

sed 's#^#http://aria2_url:6800/jsonrpc -H "Content-Type: application/json" -H "Accept: application/json" --data \x27{"jsonrpc": "2 .0","id":1,"method": "aria2.addUri", "params":["token:secret_token", ["#'

Now it all gets a bit rough. We're now creating an expression to feed to curl that will add each download to the start of the queue. We want to run something like this for each line:

curl http://aria2_url:6800/jsonrpc -H "Content-Type: application/json" -H "Accept: application/json" --data '{"jsonrpc": "2 .0","id":1,"method": "aria2.addUri", "params":["token:secret_token", [""], {"pause":"true", "http-user":"apache_user", "http-passwd":"apache_password"}]}'

So we use sed once to add the bits before the URL (s#^#whatever# replaces the start of the line). We use # in place of the normal / so it works okay with all the slashes in the URLs, and replace two of the single quotes with their ASCII equivalent \x27 because getting quotes to nest properly is hard and I don't like doing it.

sed 's#$#"], {"pause":"true", "http-user":"apache_user", "http-passwd":"apache_password"}]}\x27#'

We then use sed again to add the bits after the URL (s#$#whatever# replaces the end of the line).

xargs -L 1 curl

Once everything's put together, we feed each line to curl with xargs. A successful addition to the queue looks like this:


Why are downloads added paused?

Due to the limited bandwidth of our office connection, we only run big downloads outside of office hours and restrict speeds to avoid hitting our monthly cap. You can change "pause":"true" to "pause":"false" if you prefer.

To automatically start and stop downloads at certain times, you can add crontab entries to the server you host aria2 on:

# Pause aria2 downloads at 8am and 2pm, but remove the speed limit
0 8,14 * * 1-5 curl -H "Content-Type: application/json" -H "Accept: application/json" --data '{"jsonrpc": "2.0","id":1, "method": "aria2.pauseAll", "params":["token:secret_token"]}'
0 8,14 * * 1-5 curl -H "Content-Type: application/json" -H "Accept: application/json" --data '{"jsonrpc": "2.0","id":1, "method": "aria2.changeGlobalOption", "params":["token:secret_token",{"max-overall-download-limit":"0"}]}'

# Resume downloads at 12pm and 5pm but limit speed to 80KB/s
0 12,17 * * 1-5 curl -H "Content-Type: application/json" -H "Accept: application/json" --data '{"jsonrpc": "2.0","id":1, "method": "aria2.unpauseAll", "params":["token:secret_token"]}'
0 12,17 * * 1-5 curl -H "Content-Type: application/json" -H "Accept: application/json" --data '{"jsonrpc": "2.0","id":1, "method": "aria2.changeGlobalOption", "params":["token:secret_token",{"max-overall-download-limit":"80K"}]}'


  • wget --spider will download text files and those which are missing a ContentType header to check for further links. Apache will serve a header for most common types but does miss a few, and the DefaultType option has been deprecated so you can't set, say, application/octet-stream for anything unknown. It's therefore sensible to run this script on the server hosting the directory indexes so you're not waiting on downloads (which are albeit immediately deleted afterwards).

Laptop mysteriously turns on overnight: Logitech to blame

Something's been puzzling me for the past few weeks. At the end of each day I hibernate my laptop, stick it in my bag, and take it home. When I turn it on the next day, it tells me it powered off because the battery reached a critical level, and the battery has dropped to 3% (the shutdown threshold) from its original 100%. What gives?

I couldn't figure out whether the battery was draining itself overnight, or whether the computer was turning itself back on somehow. Luckily I have the terrible habit of falling asleep on the sofa (well, piece-of-sponge-with-some-slats) so at 3 o'clock one morning I caught it turning itself on.


Auto power-on wasn't configured in the BIOS and there was nothing plugged into the LAN port to wake it up. What had changed in the past few weeks?

Logitech Unifying Receiver

I should really clean that screen hinge.

I have a Logitech Unifying Receiver for my wireless mouse, and I had recently made the apparently highly important decision that it was probably safer to leave it plugged in all the time rather than pull it out every day so it didn't get bashed up in my bag (turns out they pull apart quite easily, and I'm 6,000 miles from a replacement). Was this the culprit?

Windows includes a handy utility to find out what devices are configured to wake a computer, powercfg. You can run powercfg /devicequery wake_armed in a command prompt:

C:\Users\Michael>powercfg /devicequery wake_armed
HID Keyboard Device (001)
Intel(R) 82579LM Gigabit Network Connection
HID-compliant mouse (002)
Logitech HID-compliant Unifying Mouse

You can also run powercfg /lastwake to find out what device last woke the computer, but since I didn't run it until the subsequent startup, this wasn't very useful. So, keyboard, mouse and the ethernet connection. The ethernet connection is out, since there's nothing plugged into it. If we go to Device Manager, the HID devices are listed under Keyboards and Mice:

Keyboards and Mice in Device Manager

Double-clicking on each one of them in turn (apart from the built-in keyboard, listed as Standard PS/2 Keyboard; and trackpad, listed as ThinkPad UltraNav Pointing Device (what a name!)) and going to the Power Management tab showed that each of them were configured to wake the computer. I don't have a keyboard connected to the receiver, but I unchecked them all just to be sure. If you're not sure which devices correspond to the Logitech receiver, go to Details and select the Hardware Ids property. My receiver shows a VID of 046D and a PID of C52B, but if yours are different you can google them to find out what manufacturer and model they correspond to.

Allow this device to wake the computer

Rerunning the powercfg command above now shows that only the ethernet adapter can wake up the computer:

C:\Users\Michael>powercfg /devicequery wake_armed
Intel(R) 82579LM Gigabit Network Connection

Problem solved!


Fix: iTunes won’t play audio after switching sound device on Windows

Just a quick one.

If you're using the generic Microsoft drivers for audio on your laptop, you might notice that you have separate audio devices for the built-in speakers and for headphones:

When you don't have any headphones plugged in, your default device will be the speakers; when you plug headphones in, your default device changes to the headphones. All well and good.

Most applications aren't bothered by this change in sound device, and will happily keep playing through the new default. iTunes, however, has some issues with this process, and will just sorta hover there with the playback bar not moving and no sound coming out. When you restart it, everything works great, but who wants to do that every time they plug their headphones in?

The solution is surprisingly simple. In iTunes, click the menu icon, choose Preferences, and go to the Playback tab. The Play Audio Using option will be set to Windows Audio Session. Change it to Direct Sound, hit OK and restart iTunes for what is hopefully the final time.

And that's it!

Setting up the Xbox 360 DVD remote with OpenELEC

I've recently moved house and so have inherited a new(ish) TV. The TV I was using before had a remote with a set of unused media buttons at the bottom, which I repurposed to control OpenELEC on my Raspberry Pi. Since the new remote doesn't have any buttons to spare, I had to give the Pi one of its very own. I had a look round and eventually settled upon the Xbox 360 DVD remote, which I picked up on eBay for an entirely reasonable three pounds - I expected to get a Chinese clone at that price but was pleasantly surprised to find that it turned out to be genuine! I remember setting up the old remote being fairly involved so I'm making it into a start-to-finish tutorial this time round.

Note: This tutorial was written for openELEC 3.2.4. If you're using a different version, some things might be different (particularly the paths, if you're using raspbmc or stock XBMC instead).

Continue reading

USB tethering with Nokia N9 on Windows

After a few days of internet troubles at work, I decided to attempt USB tethering with my Nokia N9 before Facebook withdrawal killed me (I'd browse on mobile but the only place I get signal is hanging off my desk which makes typing a bit awkward). This is a little more involved than on other platforms - if you have wifi you can use the included hotspot app, but I couldn't be bothered to walk the whole 15 minutes home to grab a wireless card. I knew that the SDK app you get when you enable developer mode (you have done this, right? Settings -> Security -> Developer Mode and hit the button) lets you set up a network over USB so you can SSH to the N9, and figured I could simply set up an SSH tunnel and proxy all my PC traffic through that. Course, it's never that easy.

Continue reading

Airplay is rubbish: better audio streaming to XBMC

Edit 15/05/13: since the release of OpenELEC 3.0.2, Airplay has been working perfectly (and most of the other bugs have vanished too). Or maybe it's that I wired everything up with ethernet when 3.0.1 came out and wireless stopped working. And by "wired" I mean "nailed the cable to my doorframe". Either way, brilliant work from the XBMC and OpenELEC teams!

The Raspberry Pi is great. For about £25, and with the help of OpenELEC, I've got my whole media library hooked up to my TV - all in all, about 1.5TB worth of entertainment.

Yeah, the speaker placement sucks.

Yeah, the speaker placement sucks.

It consolidates all the different locations (in my case, hard drives and SMB shares) into a single library for each of TV and Movies, so I can chuck someone my phone or sit them in front of a browser and everything's right there in one place. Makes choosing what to watch much easier!

Continue reading

Cheap Chinese eBay antennas – a complete waste of money?

I live in a zoo. For the most part this is super cool, but it's in the middle of nowhere and the internet access leaves a bit to be desired. I started using a 3G dongle, but the signal is rubbish and it drops out whenever it rains.

3G signal

Zero bars, aww yeah

I started looking into ways to boost the signal, discovering that many mobile broadband users in rural Australia use Yagi antennas. These are highly directional and, if used correctly, can pick up signals from towers several kilometres away. I didn't have the tools available to DIY one, so turned to eBay for a sketchy Chinese alternative. What I got was this:

Continue reading