Writing council web site DA scrapers for PlanningAlerts.org.au
I’ve spent some time lately hacking on council website scrapers to pull Development Applications into PlanningAlerts.org.au
PlanningAlerts aims to…
search as many planning authority websites as it can find and emails you details of applications near you. The aim of this to enable shared scrutiny of what is being built (and knocked down) in peoples’ communities.
A pretty noble goal, I must say. Until planning authorities themselves become more open and transparent with data, it is up to sites like these to do regular web scraping acrobatics to keep the notifications flowing.
Personally, I had several motivations, apart from giving something back to the community. Most were selfish:
- learn a bit of ruby
- become more familiar with git (and github)
- become better at web scraping using Mechanize (I’ve now used the perl, python and ruby versions)
- promote more visibility of the DAs in my area (specifically Lane Cove and Willoughby Councils)
The crew at OpenAustralia are a very welcoming bunch. Having already taken on a suggestion for their ElectionLeaflets site, I was more than happy to accept the invitation to contribute web scrapers. Like most other open source projects, I just signed up to their issue tracker, pulled the source from github and started hacking away.
There are currently 2 scrapers ready to roll. Assuming, that is, that my newbie Ruby doesn’t make the site maintainers gag. Having spent so much time in Java-land, there might have been a better way to achieve code re-use (I’ll look into this for next time). Time and tolerance to the pain of web scraping permitting, I’ll probably hack together a few more scrapers (Hunters Hill, I’m looking at you).
Like the song says, mashing up is hard to do… or something.
There’s also some indication that the scrapers could be extended to capture more data from hard to reach places in more flexible ways. In the meantime, I think I’ll look at putting more bits of my own code up on github.