Some good books in Python
# Some points about writing great computer books were also mentioned.
# Some points about writing great computer books were also mentioned.
Last time I wrote a program to download all videos in playlists,would it be nice if I can watch all these video one by one while downloading? What we have to do is starting two threads, one for downloading and put the videos in queue and the other thread trying to get videos from the queue and playing them.
def download_and_play(video_lists): q = Queue(32) threads = [] dt = PDThread(func=download_list_videos, args=(video_lists,q)) pt = PDThread(func=play_video, args=(q,)) threads.extend([dt, pt]) for t in threads: t.start() for t in threads: t.join() class PDThread(threading.Thread): def __init__(self, func, args): threading.Thread.__init__(self) self.func = func self.args = args def run(self): self.res = apply(self.func, self.args)
update: The source code now is on google code project youtube-playlists-videos-download.
Sometimes I want to watch the videos of my or even my friend’s Youtube playlists in my laptop while I am not online. Therefore I google around and found some tools to download the videos. Most of them either web-base or firefox plugin just let me download one video each time except youtube-dl . So now the problem become how to extract the video url links (or youtube id) from my playlists and fetch them to youtube-dl, and fortunately with Youtube API we can do that easily. The first step is download http://www.arrakis.es/~rggi3/youtube-dl/youtube-dl and then rename it to youtubeDL.py for code reuse as a Python module. Then we can start to extract the video links. Here is a quick implementation.
import feedparser, urllib, re, sys #download http://www.arrakis.es/~rggi3/youtube-dl/youtube-dl and then rename it to youtube-dl from youtubeDL import FileDownloader, YoutubeIE, MetacafeIE, YoutubePlaylistIE, DownloadError def retrieve_playlist(username): playlists_url = 'http://gdata.youtube.com/feeds/api/users/%s/playlists' %username feed = feedparser.parse(playlists_url) playlists = [] for en in feed.entries: title = en.title id_num = en.id.split('/')[-1] pages = gen_playlist_pages(id_num) playlists.append(dict(title=title, id_num=id_num, pages=pages)) return playlists def gen_playlist_pages(id_num): playlist_pages = [] page = 'http://gdata.youtube.com/feeds/api/playlists/%s' % id_num pages = [] for i in range(4): params = urllib.urlencode({'start-index':1+50*i, 'max-results':50}) _page = '%s?%s' % (page, params) pages.append(_page) return pages def get_video_links_from_playlists(playlists): video_lists = [] for pl in playlists: video_links = [] for p in pl['pages']: feed = feedparser.parse(p) for en in feed.entries: if re.search(r'watch',en.link): video_links.append(en.link) pl.update(dict(video_links=video_links)) video_lists.append(pl) return video_lists def download_videos(video_lists): youtube_ie = YoutubeIE() metacafe_ie = MetacafeIE(youtube_ie) youtube_pl_ie = YoutubePlaylistIE(youtube_ie) for vl in video_lists: outtmpl = vl.get('title','no_playlist_title') + u'/%(stitle)s-%(id)s.%(ext)s' fd = FileDownloader({'outtmpl': outtmpl}) fd.add_info_extractor(youtube_pl_ie) fd.add_info_extractor(metacafe_ie) fd.add_info_extractor(youtube_ie) try: retcode = fd.download(vl.get('video_links')) except DownloadError: # yes, we should handle this... maybe later pass sys.exit(retcode) if __name__ == '__main__': pls = retrieve_playlist('your_youtube_username_here') video_list = get_video_links_from_playlists(pls) download_videos(video_list)
Just put youtubeDL.py and this script file as playlists-dl.py in the same directory, and change ‘your_youtube_username_here’ to your user name then run python playlists-dl.py then all the video clips in all your playlists will be downloaded :).
Todo:
1. let user can specify username in command line.
2. if user specify the playlists id, then just download videos in those playlist.
3. use multi-thread to save total download time.
4. play the downloaded videos while other download still going on …
5. …
A short introduction in Mandarin.
I have been reading Programming Collective Intelligence by Toby Segaran recently. It’s quite interesting and inspiring. In chapter 2 , the author talks about how to make recommendation by using preference datas of a group people. Usually, to do this, iwe get the rating data, then we implent an reasonable algorithm to calucalte the ‘metric’ (or score) by taking advantage of the data. Then we make recommendation by the order of metric/score.
For example, if I know want to extend my music collections, asking other people who also have the same favorite band with me might be the best bet. Thanks to Audioscrobbler, the Last.FM web service, we can collect this data quite easy. With pyscrobbler, a set of python bindings to AudioScrobbler APIs based on ElementTree, we can write a function to get the rating data and calculate the score.:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | from audioscrobbler import AudioScrobblerQuery import operator,sys #n total numbers of bands to be recommended. def getRecommendations(favoriteBand,n=10): #since audioscrobbler return 50 fans by default, so we use 50 as full score. FULLSCORE = 50 fans = [f.element().get('username') for f in AudioScrobblerQuery(artist=favoriteBand).fans()] bands= {} for f in fans: for a in AudioScrobblerQuery(user=f).topartists()[:FULLSCORE]: name = a.name.__str__() rank = int(a.rank.__str__()) #so rank #1 will get score=FULLSCORE, rank #2 will get score=FULLSCORE-1, ...etc. score = FULLSCORE - rank + 1 if name in bands: bands[name] += score else: bands[name] = score #we do not return the artist the user just pass in. del bands[favoriteBand] recom = sorted(bands.items(), key=operator.itemgetter(1),reverse=True) return recom[:n] |
Here are the two examples to run this function in IPython shell:
In [3]: getRecommendations(’U2′)
Out[3]:
[(’Red Hot Chili Peppers’, 757),
(’Coldplay’, 669),
(’The Beatles’, 532),
(’Nirvana’, 423),
(’Aerosmith’, 409),
(’R.E.M.’, 401),
(’Moby’, 381),
(’Queen’, 379),
(’Pink Floyd’, 378),
(’Green Day’, 376)]
In [4]: getRecommendations(’Belle and Sebastian’)
Out[4]:
[(’The Beatles’, 960),
(’Radiohead’, 829),
(’The Smiths’, 580),
(’Cat Power’, 529),
(’The Arcade Fire’, 478),
(’The White Stripes’, 469),
(’Elliott Smith’, 431),
(’of Montreal’, 424),
(’The Shins’, 411),
(’Bob Dylan’, 390)]
Depending on my years of rock music listening experience, I think the results are quite impressive ![]()
In Statistics Canada census, there was the result of employed labor mode of transportation for different areas in Canada. Although the data is quite straightforward, it would be nice to have a ‘map’ view for this data which will allow us easy to see the distinction geographically. It turns out that this is an interesting topic for Google map mashup.
By using Exhibit 2.0, a Javascript library which makes creating interactive easily, all we left to do is dumping the raw csv data with the latitude and longitude of each location into proper json format. With geopy, simplejson and Python build in module csv, here is an implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | import csv,re from geopy import geocoders import simplejson COLUMN = ["geocode","place_of_residence","total_Mode_of_transportation","car_truck_or_van_as_driver","car_truck_or_van_as_passenger","total_Sustainable_transportation","public_transit","walked","bicycle","other"] GMAPKEY="Google_Map_API_Key_here" def transformcsv2json(file,jsonfile='output.js'): reader = csv.reader(open(file)) column = COLUMN + ['city','province','latlng','label'] items=[] fc = 0 def getlatlng(place): # For some unknown reasons, sometimes gecoding fails several times before succeeds for i in range(25): try: g = geocoders.Google(GMAPKEY) place, (lat, lng) = g.geocode(place) latlng = str(lat) + ', ' + str(lng) return latlng except: print "|"*60,"Waring! ", place, " ", i, " times geocoding failed!" raise Exception for row in reader: city, province = mapLocationName(row[1]) if city and province: place = city + " " + province try: latlng = getlatlng(place) rowplus = row + [city, province, latlng, place] transItem = dict(zip(column,rowplus)) items.append(transItem) except: #We take a note of the place whcih fails in geocoding but still keeping on transforing next data row anyway. print "?"*60,"Error! ", place, " ", "geocoding failed!" fc += 1 print "#"*60, 'total number of geocoding fail: ', fc f = open(jsonfile,'w') f.write(simplejson.dumps({'items':items}, ensure_ascii=False)) f.close def mapLocationName(location): #Statistic Canada province abbreviation is different from google maps mapP = {'Alta.':'AB', 'B.C.':'BC', 'Man.':'MB','N.B.':'NB','N.L.':'NL','N.S.':'NS','N.W.T.':'NT','N.U.':'NU','Ont.':'ON','P.E.I.':'PE.','Que.':'QC','Sask.':'SK','Y.T.':'YT'} try: city, province = location.split(',') province = re.findall(r'\((.*)\)',province)[0] if re.search(r'\/',province): province = province.split('/')[0] province = mapP[province] except ValueError: city = location province = '' print 'Warning! ', location, ' parsing failed' return city,province if __name__ == '__main__': transformcsv2json('placeofresidence.csv','placeofresidence.js') |
Here is the result map mashup.