Web scraper with Scrapy

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import scrapy
import re

title_regex       = r'Letra de\s+([a-zA-Z0-9áéíóúñü_,!¡¿?"() ]+)\s-'
empty_lines_regex = r"^\s+$"
tabs_regex        = r"^[\n\t]+"

class ConchaPiquerSpider(scrapy.Spider):
    name = 'conchitabot'
    allowed_domain = ['']
    start_urls = ['']
    custom_settings = {
        'FEED_EXPORT_ENCODING': 'utf-8',
    BASE_URL = ''
    def parse(self, response):
        lyric_links = response.css(".lista_uno li a::attr(href)").extract()
        for link in lyric_links:
            absolute_url = self.BASE_URL + link
            yield scrapy.Request(absolute_url, callback=self.parse_lyric)
        lyric_names_raw = response.css(".lista_uno li a::text").extract()

    def parse_lyric(self,response):
        raw_titles = response.css("h1").extract()
        for raw_title in raw_titles:
            match =, raw_title.encode("utf-8"))
            if match:
                title =
        raw_text = response.css("#HOTWordsTxt::text").extract()
        encoded_text = []
        single_string = ""
        for item_text in raw_text:
            single_string = single_string + item_text

        lyric = self.clean_lyric(single_string)

        text_file = open("./letras/" + title + ".txt", "w")

    def clean_lyric(self,dirty_str):
        encoded = dirty_str.encode("utf-8")
        no_spaces = re.sub(r"^\s+", '', encoded)
        no_tabs = re.sub(r"[\n\t]+", '', no_spaces)
        return no_tabs
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import re

def get_sorted_files(Directory):
    filenamelist = []
    for root, dirs, files in os.walk(Directory):
        for name in files:
            fullname = os.path.join(root, name)
    return sorted(filenamelist)

text = "<head><meta charset='utf-8'>"
folder = "./letras/"
files = get_sorted_files(folder)
for filename in files:
    filebase = re.sub(folder, "", filename)
    filebase = re.sub("\..*$", "", filebase)
    with open(filename,'r') as f:
        text = text + "<h1>" + filebase + "</h1><pre>" + + "</pre>"

unified = open("unified.html", "w")
Teoría de la señal

Basis in a vector space

A vector space basis is the skeleton from which a vector space is built. It allows to decompose any signal into a linear combination of simple building blocks, namely, the basis vectors. The Fourier Transform is just a change of basis.

A vector basis is the linear combination of a set of vector that can write any vector of the space.

\[ w^{(k)} \leftarrow \text{basis} \]

The canonical basis in \(\mathbb{R}^2\) are:

\(e^{(0)} = [1, 0]^T \ \ e^{(1) } = [0,1]^T \)

Nevertheless, there are more basis for \(\mathbb{R}^2\):

\(e^{(0)} = [1, 0]^T \ \ e^{(1) } = [1,1]^T \)

This former basis is not linearly independent as information of \(e^{(0)}\) is inside \(e^{(1)}\).

Formal definition

H is a vector space.

W is a set of vectors from H such that \(W = \left\{ w^{(k)}  \right\} \)

W is a basis of H if:

  1. We can write \( \forall x \in H\): \( x = \sum_{k=0}^{K-1} \alpha_k w^{(k)}, \ \ \alpha_k \in \mathbb{C} \)
  2. \( \alpha_k  \) are unique, namely, there is linear independence in the basis, as a given point can only be expressed in a unique combination of the basis.

Orthogonal basis are those which inner product returns 0:

\( \left \langle w^{(k)}, w^{(n)} \right \rangle = 0, \ \ \text{for } k \neq n \)

In addition, if the self inner product of every basis element return 1, the basis are orthonormal.

How to change the basis?

An element in the vector space can be represented with a new basis computing the projection of the current basis in the new basis. If \(x\) is a vector element and is represented with the vector basis \(w^{(K)}\) with the coefficients \(a_k\), it can also be represented as a linear combination of the basis \(v^{(k)}\) with the coefficients \( \beta_k\). In a mathematical notation:

\[ x = \sum_{k=0}^{K-1} \alpha_k w^{(k)} = \sum_{k=0}^{K-1} \beta_k v^{(k)} \]

If \(\left\{ v^{(k)} \right\}\) is orthonormal, the new coefficients \(\beta_k\) can be computed as a linear combination of the previous coefficients and the projection of the new basis over the original one:

\[\beta_h = \left \langle v^{(h)}, x \right \rangle = \left \langle v^{(h)}, \sum_{k=0}^{K-1} \alpha_k w^{(k)} \right \rangle = \sum_{k=0}^{K-1} \alpha_k \left\langle v^{(h)}, w^{(k)} \right \rangle \]

This operation can also be represented in a matrix form as follows:

\[ \beta_h = \begin{bmatrix}
c_{00} & c_{01} & \cdots & c_{0\left(K-1 \right )}\\
& & \vdots & \\
c_{\left(K-1 \right )0} & c_{\left(K-1 \right )01} & \cdots & c_{\left(K-1 \right )\left(K-1 \right )}
\alpha_0 \\
\vdots \\
\end{bmatrix} \]

This operation is widely used in algebra. A well-known example of a change of basis could be the Discrete Fourier Transform (DFT).

Sin categoría

Inner product in vector space

The inner product is an operation that measures the similarity between vectors.  In a general way, the inner product could be defined as an operation of 2 operands, which are elements of a vector space. The result is a scalar in the set of the complex numbers:

\[ \left \langle \cdot, \cdot \right \rangle : V \times V \rightarrow \mathbb{C}  \]

Formal properties

For \(x, y, z \in V\) and \(\alpha \in \mathbb{C}\), the inner product must fulfill the following rules:

To be distributive to vector addition:

\( \left \langle x+y, z \right \rangle = \left \langle x, z \right \rangle + \left \langle y, z \right \rangle \)

Conmutative with conjugate (applies when vectors are complex):

\( \left \langle x,y \right \rangle  = \left \langle y, x \right \rangle^* \)

Distributive respect scalar multiplication:

\(  \left \langle \alpha x, y \right \rangle =  \alpha^* \left \langle x, u \right \rangle \)

\(  \left \langle  x, \alpha y \right \rangle =  \alpha \left \langle x, u \right \rangle \)

The self inner product must be necessarily a real number:

\(  \left \langle  x, x \right \rangle \geq 0 \)

The self inner product can be zero only when the element is the null element:

\( \left \langle x,x \right \rangle = 0 \Leftrightarrow x = 0 \)

Inner product in \(\mathbb{R}^2 \)

The inner product in \( \mathbb{R}^2\) is defined as follows:

\( \left \langle x, y \right \rangle = x_0 y_0 + x_1 y_1 \)

In self inner product represents the squared norm of the vector:

\( \left \langle x, x \right \rangle = x^2_0 + y^2_0 = \left \| x \right \|^2 \)

Inner product in finite length signals

In this case, the inner product is defined as:

\[ \left \langle x ,y \right \rangle = \sum_{n= 0}^{N-1} x^*[n] y[n] \]

Sin categoría

Properties of vector spaces

Vector spaces must meet the following rules:
Addition to be commutative:
\( x + y = y + x \)

Addition to be distributive:
\( (x+y)+z = x + (y + z) \)

Scalar multiplication to be distributive with respect to vector addition:
\( \alpha\left(x + y \right) = \alpha x + \alpha y\)

Scalar multiplication to be distributive with respect to vector the addition of field scalars:
\( \left( \alpha + \beta \right) x = \alpha x + \beta y \)

Scalar multiplication to be associative:
\( \alpha\left(\beta x \right) = \left(\alpha \beta \right) x \)

It must exist a null element:
\( \exists 0 \in V \ \ | \ \ x + 0 = 0 + x = x \)

It must exist an inverse element for every element in the vector space:
\( \forall x \in V \exists (-x)\ \ | \ \ x + (-x) = 0\)