Tag Archives: Wikipedia

Wikipedia Watchlist RSS

Wikipedia’s Watchlist has a RSS feed, which is a nice way to keep track changes on your watched articles. However, the RSS feed doesn’t seem to conform to the RSS standard, and confuses some feedreaders – notably, Opera M2.

Here’s a simple PHP script, which you can place on your website, that’ll fix the RSS feed. You’ll need to give it the URL to your RSS feed (which you can get by subscribing to your watchlist), since it contains your secret API key.

$rss_url = 'http://en.wikipedia.org/w/api.php'
	. '?action=feedwatchlist'
	. '&allrev=allrev'
	. '&hours=72'
	. '&wlowner=YOUR_USERNAME_HERE'
	. '&wltoken=YOUR_API_KEY_HERE'
	. '&feedformat=rss';
$feed = `wget -q -O - '$rss_url'`;
$rows = explode("\n", $feed);
header('Content-type: application/rss+xml');
foreach ($rows as $row)
	if (strpos($row, '<item>')!==false)
		$alldata = '';
	if (strpos($row, '<guid>')!==false)
	$alldata .= $row;
	if (strpos($row, '</item>')!==false)
		$row = '<guid isPermaLink="false">' 
			. md5($alldata) 
			. "</guid>\n"
			. $row;
	echo $row . "\n";

Import Wikipedia page history to git

I’ve written a small tool which downloads the history of a Wikipedia article, converts it and imports it into a new git repository. The main motivation behind writing it is being able to perform a per-line blame of the article’s history. I had tried levitation, but that tool seemed to be oriented towards large imports (or it might just be buggy), as it attempted to create huge binary files and ran longer than my patience would allow when I gave it the history of just one article. Also, I wanted the tool to take care of the downloading and importing part – so I could be one command away from a git repository of any WP article.

The tool can be made faster (all the XML and string management stuff adds an overhead), but right now it’s fast enough for me. One thing that can be optimized is making it not load the entire input XML into memory – it’s possible to do the conversion by “streaming” the XML. Another current limitation is that it’s currently hard-wired to the English Wikipedia.

Requires curl and (obviously) git. You’ll need a D1 D2 compiler to compile the code.

August 2013 update: Updated to D2. Now creates the directory automatically. Added --keep-history switch.

Source, Windows binary.