I don't know how I could have missed something like this, probably I heard it but ignored it. But today I spent a lot of time reading up on the Internet archive. I am impressed people talk about google cache but the Internet Archive is just outstanding. Essentially made up of data crawled by alexa it has snapshots of websites from 1996 to today! Eg. I pulled up my website's data and it showed snapshots of how it used to look in 2004 with almost all functional links..
eg
bhasker.net from 2004 February
That's how my website looked back February of 2004!!..It was just amazing and fun to be able to see how it was and how it is now. Strangely enough in 2005 for some reason they stopped crawling my website I think. Anyway then I went on to see snapshots of yahoo.com and they have pages of yahoo.com archived all the way from 1996! with almost daily snapshots from the year 1999/2000.
The very scope of this project is mind blowing. It gives a nostalgic feeling to see how far the web has come in the last 12 yrs, I still remember the early days of internet in India when we used to use VSNL shell accounts in 1995. Heck I remember using ernet networks to browse the net which before August 1995 was the only way to get onto the internet in India. BBS's were all the rage then. Linux was in its infancy, Windows 95 the coolest OS out there and OS/2 Warp 4 trying hard to compete. So much has changed and its still changing so rapidly. The concept of the Internet Archive is just beautiful and I hope they keep up the good work. The system is named very aptly Wayback Machine!. It really is a walk down the history of internet as we know.
I will try to update as and when I come across some interesting history, time to go on a journey through time!
P.S : btw its not just text data that they are archiving, there are archives of movies,cartoons,audio,music etc etc
