Fecha: February 5th, 2010 | Categoría: Internet | 12 Comments »
As my mother tongue is not english, I’ve been always appreciative of things like Ted Translations, because it allows me to share my findings on the web with people from my country, where not everybody is as fluent in english as to hear and understand every word of a TED Talk. But I’ve found annoying that you could watch the video with subtitles online, but you couldn’t download them in a appropriate format (I generally use the ‘.srt’ format).
I did some research, created a python script that lets you download the subtitles, and parse them from JSON to the ‘.srt’ format; but in these days a black-and-white command-line script is not acceptable. So I made it my first web-app, a TED Talk Subtitle Downloader.
http://tedtalksubtitledownload.appspot.com/
(online implementation of http://estebanordano.com.ar/ted-talks-download-subtitles/)
Fecha: January 5th, 2010 | Categoría: Internet | 13 Comments »
Go to the Online version
This is what I’ve been working on today. It’s a simple console-based script to download subtitles for TED Talks – since I haven’t found a way to download them directly from the web in a compatible format (I generally use ‘.srt’ subtitles). Here is the script made in python. TEDTalkSubtitles.py
Key parts of the program:
A simple function to parse the value in miliseconds to something like “00:34:32,334″:
-
def getFormatedTime(intvalue):
-
mils = intvalue%1000
-
segs = (intvalue/1000)%60
-
mins = (intvalue/60000)%60
-
hors = (intvalue/3600000)
-
return "%02d:%02d:%02d,%03d"%(hors,mins,segs,mils)
With this recursive function, fetch available languages for the talk
-
def availableSubs(subs):
-
a = subs.find("LanguageCode")
-
if a == -1:
-
return []
-
subs = subs[a+len("LanguageCode"):]
-
return [re.search("%22([^A-Z]+)%22", subs).group(1)] + availableSubs(subs)
Get information about the video
-
def getVideoParameters(urldirection):
-
ht = urllib.urlopen(urldirection).read()
-
var = re.search(‘flashVars = {\n([^}]+)}’, ht)
-
if var:
-
var = var.group(1)
-
else:
-
return None
-
var = [a.replace(‘\t‘, ”) for a in var.split(‘\n‘)]
-
for a in range(len(var)):
-
if var[a]:
-
var[a] = var[a][:var[a].rfind(‘,’)]
-
resultado = []
-
for a in var:
-
l = a.find(‘:’)
-
if l != -1:
-
resultado.append((a[:l], a[l+1:]))
-
return dict(resultado)
Getting it all together:
-
def downloadSub(idtalk, lang, timeIntro):
-
print("Downloading subtitles for language %s"%lang)
-
c = simplejson.load(urllib.urlopen(‘http://www.ted.com/talks/subtitles/id/%s/lang/%s’%(idtalk, lang)))
-
salida = file(’subs_%s_%s.srt’%(idtalk,lang), ‘w’)
-
conta = 1
-
c = c[‘captions’]
-
for linea in c:
-
salida.write("%d\n"%conta)
-
conta += 1
-
salida.write("%s –> %s\n"%(getFormatedTime(timeIntro+linea[’startTime’]), getFormatedTime(timeIntro+linea[’startTime’]+linea[‘duration’])))
-
salida.write("%s\n\n"%(linea[‘content’].encode(‘utf-8′)))
-
salida.close()
Related to:
Parsing and Converting TED Talks JSON Subtitles
Download subtitles from TED talks for offline viewing