Romancero bot: converting article text into voice with Amazon Polly and sending it to Telegram

The following Python script extracts the content of an article using Readibility (ported to Python), converts it to voice using the Amazon Polly service and finally sends the audio as a voice note to a given user in Telegram using Telethon (Telegram client for Python).

For running the script you will need to install the following Python packages:

Also, you will need to create a AWS account. If you already have an AWS account, make sure that you have a user created in the IAM Management Console with the following permissions:

User permissions for creating Polly jobs and accessing/writing files in a S3 bucket.

When creating this user, make sure you write down its ID access key and its secret access key. You will need them to configure the aws-cli client.

With this Amazon credentials, you can configure the AWS client by executing the following command:

In this step, you will need to fulfill the details with the user credentials you wrote down when creating it.

Now, you will need to create a Telegram API ID. For this, you can go to the Telegram section «Create an Application«. After following the steps described in the official documentation, you will obtain an API ID (it’s a number) and a API hash (it’s a string).

With these steps already completed, you can place all the needed details in the script and run it.

The user you specified will receive a message like this:

Parser in PHP using regular expressions

You can use regular expressions in PHP by using the function preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] ) . Only the first two paremeters are mandatory and they are the regex and the string where you want to search respectively.

In case of finding a result, preg_match() returns an array where the item at index 0 is the whole match. From 1 onwards they are placed the different groups of your regular expressions (in case there is any). If no match is found, preg_match() returns null.

One of the details that must be taken into account when using regular expressions on PHP is that they must be enclosed by forward slashes (/), e.g. $multiline_meaning_re = '/^([A-za-z ,"().\';:]+)/'; . This regular expression matches any string with any set of letters, spaces, commas, double and single quotes, parenthesis points, colon and/or semicolon.

As a complete example, the following snippet opens a file, parses it to look for English idioms and uploads all of them a MySQL database.

You can find further information about the preg_match() in the PHP official documentation.

Introduction to D3 (Data-Driven Document)

One of the first questions you may ask yourself when getting introduced in D3 is: why are we using selectAll(‘html-tag-name’) method if there is no item to select of that type?

First, let’s see an example of the situation we are talking about:

In the previous example, the only existing HTML tag is <body> . We select body ( d3.select('body') ) and then we perform the .selectAll('h2') . On its own, it makes no sense as the method won’t return any value since no <h2>  tag exists. Nevertheless, it will make sense if we keep looking at the following code.

After the select .selectAll('h2')  we attach the existing dataset to the selected items ( .data(dataset) ). Then, we use the enter()  method, which gives meaning to the previous .selectAll('h2') . When using enter() , D3 looks for the number of selected items to bind them with the data. In case of having not enough items in the selection, the enter()  method will create them.

Therefore, as .selectAll('h2')  was empty and the dataset  variable contains 9 elements, it will iterate the code 9 times. In case of having already created some  <h2>  elements, it will simply fulfill the HTML code the necessary iteration to cover all the dataset  elements. Remember that who does this iteration is  the data() method.  It parses the data set, and any method that’s chained after data() is run once for each item in the data set.

You can find more information in the official documentation at the D3js.org website.

Scales

In D3 there exists the Scale function to change the value of the data set so that it can fit in the screen. Two important methods are range() and  domain(). The domain method covers the set of input values whereas the range function convers the set of output values. Let’s see an example:

From freeCodeCamp:

The domain()  method passes information to the scale about the raw data values for the plot. The range()  method gives it information about the actual space on the web page for the visualization.

Web scraper with Scrapy