Fecha: February 5th, 2010 | Categoría: Internet | 12 Comments »
As my mother tongue is not english, I’ve been always appreciative of things like Ted Translations, because it allows me to share my findings on the web with people from my country, where not everybody is as fluent in english as to hear and understand every word of a TED Talk. But I’ve found annoying that you could watch the video with subtitles online, but you couldn’t download them in a appropriate format (I generally use the ‘.srt’ format).
I did some research, created a python script that lets you download the subtitles, and parse them from JSON to the ‘.srt’ format; but in these days a black-and-white command-line script is not acceptable. So I made it my first web-app, a TED Talk Subtitle Downloader.
http://tedtalksubtitledownload.appspot.com/
(online implementation of http://estebanordano.com.ar/ted-talks-download-subtitles/)
Fecha: February 1st, 2010 | Categoría: Personal | 3 Comments »
I loved the Global Game Jam!! Everything was great: the community, the people, the experience, the game we finished in time, “Land the Mime“. This is the postmortem that I wrote.
Read the rest of this entry »
Fecha: January 5th, 2010 | Categoría: Internet | 16 Comments »
Go to the Online version
This is what I’ve been working on today. It’s a simple console-based script to download subtitles for TED Talks – since I haven’t found a way to download them directly from the web in a compatible format (I generally use ‘.srt’ subtitles). Here is the script made in python. TEDTalkSubtitles.py
Key parts of the program:
A simple function to parse the value in miliseconds to something like “00:34:32,334″:
-
def getFormatedTime(intvalue):
-
mils = intvalue%1000
-
segs = (intvalue/1000)%60
-
mins = (intvalue/60000)%60
-
hors = (intvalue/3600000)
-
return "%02d:%02d:%02d,%03d"%(hors,mins,segs,mils)
With this recursive function, fetch available languages for the talk
-
def availableSubs(subs):
-
a = subs.find("LanguageCode")
-
if a == -1:
-
return []
-
subs = subs[a+len("LanguageCode"):]
-
return [re.search("%22([^A-Z]+)%22", subs).group(1)] + availableSubs(subs)
Get information about the video
-
def getVideoParameters(urldirection):
-
ht = urllib.urlopen(urldirection).read()
-
var = re.search(‘flashVars = {\n([^}]+)}’, ht)
-
if var:
-
var = var.group(1)
-
else:
-
return None
-
var = [a.replace(‘\t‘, ”) for a in var.split(‘\n‘)]
-
for a in range(len(var)):
-
if var[a]:
-
var[a] = var[a][:var[a].rfind(‘,’)]
-
resultado = []
-
for a in var:
-
l = a.find(‘:’)
-
if l != -1:
-
resultado.append((a[:l], a[l+1:]))
-
return dict(resultado)
Getting it all together:
-
def downloadSub(idtalk, lang, timeIntro):
-
print("Downloading subtitles for language %s"%lang)
-
c = simplejson.load(urllib.urlopen(‘http://www.ted.com/talks/subtitles/id/%s/lang/%s’%(idtalk, lang)))
-
salida = file(’subs_%s_%s.srt’%(idtalk,lang), ‘w’)
-
conta = 1
-
c = c[‘captions’]
-
for linea in c:
-
salida.write("%d\n"%conta)
-
conta += 1
-
salida.write("%s –> %s\n"%(getFormatedTime(timeIntro+linea[’startTime’]), getFormatedTime(timeIntro+linea[’startTime’]+linea[‘duration’])))
-
salida.write("%s\n\n"%(linea[‘content’].encode(‘utf-8′)))
-
salida.close()
Related to:
Parsing and Converting TED Talks JSON Subtitles
Download subtitles from TED talks for offline viewing
Fecha: December 14th, 2009 | Categoría: Informatica | 4 Comments »
Bueno, como paso previo a lanzar mi aplicación para navegar IOL (la plataforma para los contenidos de mi universidad) de manera decente (JIOLSucker no me cabe), me puse a investigar cómo navegar con python una página web.
Lo primero que me entretuvo fue ver cómo se hacen estas cosas en python, cómo existen dos librerías que se llaman igual y son necesarias (urllib vs urllib2), cómo armar una request; etcétera.
Revisé el código de IOL2Twitter que publicaron los capos de Zauber (muy buena empresa Argentina de software… hacen muy buenos laburos), pero era sólo para las noticias (y las mismas no necesitaban login). Así que seguí metiendo mano a ver cómo era para loguearse en IOL.
Resulta que hay que poner un número mágico como hidden value en el form… el primer WTF. También se pasa el “comando” a la página que controla todo por hidden value del form.
Read the rest of this entry »