Archive for the 'Web' Category
Download (and play) all videos in your Youtube playlist
Last time I wrote a program to download all videos in playlists,would it be nice if I can watch all these video one by one while downloading? What we have to do is starting two threads, one for downloading and put the videos in queue and the other thread trying to get videos from the queue and playing them.
def download_and_play(video_lists): q = Queue(32) threads = [] dt = PDThread(func=download_list_videos, args=(video_lists,q)) pt = PDThread(func=play_video, args=(q,)) threads.extend([dt, pt]) for t in threads: t.start() for t in threads: t.join() class PDThread(threading.Thread): def __init__(self, func, args): threading.Thread.__init__(self) self.func = func self.args = args def run(self): self.res = apply(self.func, self.args)
update: The source code now is on google code project youtube-playlists-videos-download.
Download all videos in your Youtube playlist
Sometimes I want to watch the videos of my or even my friend’s Youtube playlists in my laptop while I am not online. Therefore I google around and found some tools to download the videos. Most of them either web-base or firefox plugin just let me download one video each time except youtube-dl . So now the problem become how to extract the video url links (or youtube id) from my playlists and fetch them to youtube-dl, and fortunately with Youtube API we can do that easily. The first step is download http://www.arrakis.es/~rggi3/youtube-dl/youtube-dl and then rename it to youtubeDL.py for code reuse as a Python module. Then we can start to extract the video links. Here is a quick implementation.
import feedparser, urllib, re, sys #download http://www.arrakis.es/~rggi3/youtube-dl/youtube-dl and then rename it to youtube-dl from youtubeDL import FileDownloader, YoutubeIE, MetacafeIE, YoutubePlaylistIE, DownloadError def retrieve_playlist(username): playlists_url = 'http://gdata.youtube.com/feeds/api/users/%s/playlists' %username feed = feedparser.parse(playlists_url) playlists = [] for en in feed.entries: title = en.title id_num = en.id.split('/')[-1] pages = gen_playlist_pages(id_num) playlists.append(dict(title=title, id_num=id_num, pages=pages)) return playlists def gen_playlist_pages(id_num): playlist_pages = [] page = 'http://gdata.youtube.com/feeds/api/playlists/%s' % id_num pages = [] for i in range(4): params = urllib.urlencode({'start-index':1+50*i, 'max-results':50}) _page = '%s?%s' % (page, params) pages.append(_page) return pages def get_video_links_from_playlists(playlists): video_lists = [] for pl in playlists: video_links = [] for p in pl['pages']: feed = feedparser.parse(p) for en in feed.entries: if re.search(r'watch',en.link): video_links.append(en.link) pl.update(dict(video_links=video_links)) video_lists.append(pl) return video_lists def download_videos(video_lists): youtube_ie = YoutubeIE() metacafe_ie = MetacafeIE(youtube_ie) youtube_pl_ie = YoutubePlaylistIE(youtube_ie) for vl in video_lists: outtmpl = vl.get('title','no_playlist_title') + u'/%(stitle)s-%(id)s.%(ext)s' fd = FileDownloader({'outtmpl': outtmpl}) fd.add_info_extractor(youtube_pl_ie) fd.add_info_extractor(metacafe_ie) fd.add_info_extractor(youtube_ie) try: retcode = fd.download(vl.get('video_links')) except DownloadError: # yes, we should handle this... maybe later pass sys.exit(retcode) if __name__ == '__main__': pls = retrieve_playlist('your_youtube_username_here') video_list = get_video_links_from_playlists(pls) download_videos(video_list)
Just put youtubeDL.py and this script file as playlists-dl.py in the same directory, and change ‘your_youtube_username_here’ to your user name then run python playlists-dl.py then all the video clips in all your playlists will be downloaded :).
Todo:
1. let user can specify username in command line.
2. if user specify the playlists id, then just download videos in those playlist.
3. use multi-thread to save total download time.
4. play the downloaded videos while other download still going on …
5. …
cuil - a new search engine developed by ex-googlers.
The result search page is prettify with icons and ajax style categories.
XTimeline
Besides MIT SIMILE Timeline, I just discovered XTimeline, an even more easy way to build the timeline on the webpage. Well, I mean more easy if you do it manually. To give it a try, I discovered this article Timeline: Countdown to the 2010 Olympic Games by Google. Here is how it looks for the first three events.
Music Recommendations by audioscrobbler
I have been reading Programming Collective Intelligence by Toby Segaran recently. It’s quite interesting and inspiring. In chapter 2 , the author talks about how to make recommendation by using preference datas of a group people. Usually, to do this, iwe get the rating data, then we implent an reasonable algorithm to calucalte the ‘metric’ (or score) by taking advantage of the data. Then we make recommendation by the order of metric/score.
For example, if I know want to extend my music collections, asking other people who also have the same favorite band with me might be the best bet. Thanks to Audioscrobbler, the Last.FM web service, we can collect this data quite easy. With pyscrobbler, a set of python bindings to AudioScrobbler APIs based on ElementTree, we can write a function to get the rating data and calculate the score.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | from audioscrobbler import AudioScrobblerQuery import operator,sys #n total numbers of bands to be recommended. def getRecommendations(favoriteBand,n=10): #since audioscrobbler return 50 fans by default, so we use 50 as full score. FULLSCORE = 50 fans = [f.element().get('username') for f in AudioScrobblerQuery(artist=favoriteBand).fans()] bands= {} for f in fans: for a in AudioScrobblerQuery(user=f).topartists()[:FULLSCORE]: name = a.name.__str__() rank = int(a.rank.__str__()) #so rank #1 will get score=FULLSCORE, rank #2 will get score=FULLSCORE-1, ...etc. score = FULLSCORE - rank + 1 if name in bands: bands[name] += score else: bands[name] = score #we do not return the artist the user just pass in. del bands[favoriteBand] recom = sorted(bands.items(), key=operator.itemgetter(1),reverse=True) return recom[:n] |
Here are the two examples to run this function in IPython shell:
In [3]: getRecommendations(’U2′)
Out[3]:
[(’Red Hot Chili Peppers’, 757),
(’Coldplay’, 669),
(’The Beatles’, 532),
(’Nirvana’, 423),
(’Aerosmith’, 409),
(’R.E.M.’, 401),
(’Moby’, 381),
(’Queen’, 379),
(’Pink Floyd’, 378),
(’Green Day’, 376)]
In [4]: getRecommendations(’Belle and Sebastian’)
Out[4]:
[(’The Beatles’, 960),
(’Radiohead’, 829),
(’The Smiths’, 580),
(’Cat Power’, 529),
(’The Arcade Fire’, 478),
(’The White Stripes’, 469),
(’Elliott Smith’, 431),
(’of Montreal’, 424),
(’The Shins’, 411),
(’Bob Dylan’, 390)]
Depending on my years of rock music listening experience, I think the results are quite impressive ![]()
Sustainable transportation in Canada: An example of Exhibit map view
In Statistics Canada census, there was the result of employed labor mode of transportation for different areas in Canada. Although the data is quite straightforward, it would be nice to have a ‘map’ view for this data which will allow us easy to see the distinction geographically. It turns out that this is an interesting topic for Google map mashup.
By using Exhibit 2.0, a Javascript library which makes creating interactive easily, all we left to do is dumping the raw csv data with the latitude and longitude of each location into proper json format. With geopy, simplejson and Python build in module csv, here is an implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | import csv,re from geopy import geocoders import simplejson COLUMN = ["geocode","place_of_residence","total_Mode_of_transportation","car_truck_or_van_as_driver","car_truck_or_van_as_passenger","total_Sustainable_transportation","public_transit","walked","bicycle","other"] GMAPKEY="Google_Map_API_Key_here" def transformcsv2json(file,jsonfile='output.js'): reader = csv.reader(open(file)) column = COLUMN + ['city','province','latlng','label'] items=[] fc = 0 def getlatlng(place): # For some unknown reasons, sometimes gecoding fails several times before succeeds for i in range(25): try: g = geocoders.Google(GMAPKEY) place, (lat, lng) = g.geocode(place) latlng = str(lat) + ', ' + str(lng) return latlng except: print "|"*60,"Waring! ", place, " ", i, " times geocoding failed!" raise Exception for row in reader: city, province = mapLocationName(row[1]) if city and province: place = city + " " + province try: latlng = getlatlng(place) rowplus = row + [city, province, latlng, place] transItem = dict(zip(column,rowplus)) items.append(transItem) except: #We take a note of the place whcih fails in geocoding but still keeping on transforing next data row anyway. print "?"*60,"Error! ", place, " ", "geocoding failed!" fc += 1 print "#"*60, 'total number of geocoding fail: ', fc f = open(jsonfile,'w') f.write(simplejson.dumps({'items':items}, ensure_ascii=False)) f.close def mapLocationName(location): #Statistic Canada province abbreviation is different from google maps mapP = {'Alta.':'AB', 'B.C.':'BC', 'Man.':'MB','N.B.':'NB','N.L.':'NL','N.S.':'NS','N.W.T.':'NT','N.U.':'NU','Ont.':'ON','P.E.I.':'PE.','Que.':'QC','Sask.':'SK','Y.T.':'YT'} try: city, province = location.split(',') province = re.findall(r'\((.*)\)',province)[0] if re.search(r'\/',province): province = province.split('/')[0] province = mapP[province] except ValueError: city = location province = '' print 'Warning! ', location, ' parsing failed' return city,province if __name__ == '__main__': transformcsv2json('placeofresidence.csv','placeofresidence.js') |
Here is the result map mashup.
How to download flcikr pictures with specific tags and use them as screensaver slides
Flickr Leech is a slick site I often visit. One can browse mutiple pictures by favorite tags, user name, interestingness …etc at the same time. Today when I was looking at all those beautiful pictures from all over the world again, an idea came to me … Maybe I can download those pictures automatically as my screensaver slides? Then I started to dig into Flickr API , it turns out the answer is yes indeed. Here is my quick hack with Python: flickrDownload.py . After excuting
python flickrDownload.py dog 200
I was starting download the most popular 200 pictures which were tagged with “dog” from Flicke. Then I open f-spot in Ubuntu and tag all these pictures with dog,
and go to Edit->Preference to set them as my screensaver slides.
