Letras de tango

2010 March 29
by Simba

We all know that tango is about lost love, broken hearts, right? Well — now we can check it out.  Hans at Amsterdam tango club put all the 13,000 lyrics they got in their online database in a single html file, which made it rather easy to do some statistics. I wrote a small script, counting the occurrences of each word, and the figure above is a representation of the results, but leaving the very shortest words like y, o, el, lo etc out. The illustration is made with the wordle tool that was popular among bloggers a while back. If you really like it, get the t-shirt.

It is not perfect, there are some peculiarities in the formatting, misspellings etc, but we get the idea. I removed the most obvious error sources, but there is still room for improvement. If you want to improve the analysis, you can download it and improve it yourself :-) If you do, please publish your results, your modifications and link back here. The python script and the raw results from the analysis can be found in the zip file.

The top 100 words were (with counts):

  1. amor : 13000
  2. como : 11511
  3. para : 9413
  4. tango : 8942
  5. vida : 8018
  6. cuando : 7199
  7. más : 6601
  8. corazón : 6554
  9. porque : 5534
  10. pero : 5264
  11. quiero : 4987
  12. todo : 4857
  13. esta : 4689
  14. noche : 4411
  15. alma : 4286
  16. siempre : 4014
  17. este : 3666
  18. solo : 3461
  19. donde : 3347
  20. nunca : 3307
  21. ojos : 3052
  22. dolor : 3043
  23. entre : 3038
  24. hasta : 2735
  25. bien : 2496
  26. así : 2449
  27. tengo : 2436
  28. tiempo : 2411
  29. triste : 2408
  30. día : 2382
  31. nada : 2318
  32. mujer : 2293
  33. qué : 2235
  34. ella : 2202
  35. tanto : 2194
  36. viejo : 2163
  37. aquel : 2132
  38. cielo : 2130
  39. está : 2122
  40. canta : 2068
  41. ayer : 2048
  42. quien : 2037
  43. tiene : 2012
  44. canción : 2005
  45. milonga : 2000
  46. dios : 1997
  47. buenos : 1980
  48. desde : 1952
  49. mundo : 1948
  50. flor : 1919
  51. aires : 1910
  52. estoy : 1879
  53. pena : 1864
  54. querer : 1836
  55. otra : 1831
  56. pobre : 1828
  57. hace : 1818
  58. vivir : 1752
  59. recuerdo : 1747
  60. vals : 1701
  61. también : 1695
  62. aunque : 1685
  63. luna : 1628
  64. canto : 1618
  65. todos : 1611
  66. cada : 1602
  67. sobre : 1547
  68. barrio : 1543
  69. puedo : 1521
  70. sueño : 1516
  71. otro : 1501
  72. gardel : 1470
  73. llorar : 1461
  74. nadie : 1447
  75. cariño : 1441
  76. ilusión : 1405
  77. gran : 1401
  78. lado : 1387
  79. boca : 1370
  80. tierra : 1368
  81. hombre : 1365
  82. ahora : 1364
  83. cosas : 1357
  84. tarde : 1353
  85. camino : 1350
  86. años : 1337
  87. cantar : 1321
  88. noches : 1312
  89. eres : 1312
  90. siento : 1311
  91. dulce : 1300
  92. madre : 1297
  93. mismo : 1295
  94. después : 1289
  95. toda : 1284
  96. sólo : 1277
  97. nuestro : 1265
  98. beso : 1263
  99. viento : 1255
  100. aquí : 1244
One Response leave one →
  1. 2010 March 29
    El Chupacabra permalink

    You should apply it to the words of tango blogs – I think that would be interesting!

    Wisdom: http://ampstertango.blogspot.com/

    Gossip: http://tangoconfidential.blogspot.com/

    History: http://tangocommuter1.blogspot.com/

    Rant: http://insearchoftango.blogspot.com/

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS