Categorías
Programación

Parser in PHP using regular expressions

You can use regular expressions in PHP with the function preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] ) . Only the first two paremeters are mandatory and they are the regex and the string where you want to search respectively.

In case of finding a result, preg_match() returns an array where the item at index 0 is the whole match. From 1 onwards they are placed the different groups of your regular expressions (in case there is any). If no match is found, preg_match() returns null.

One of the details that must be taken into account when using regular expressions on PHP is that they must be enclosed by forward slashes (/), e.g. $multiline_meaning_re = ‘/^([A-za-z ,»().\’;:]+)/’; . This regular expression matches any string with any set of letters, spaces, commas, double and single quotes, parenthesis points, colon and/or semicolon.

As a complete example, the following snippet opens a file, parses it to look for English idioms and uploads all of them a MySQL database.

<?php
$servername = "";
$database = "";
$username = "";
$password = "";
$conn = mysqli_connect($servername, $username, $password, $database);

// Check connection
if (!$conn) {
    die("Connection failed: " . mysqli_connect_error());
}

    class Idiom {
        var $idiom = "";
        var $meaning = "";
        var $example = [];
        function print(){
            echo($this->idiom . "<br>" . $this->meaning . "<ul>");
            print_r($this->example);
            if(count($this->example) == 0 )
              echo("ERROR!!!!. There should be at least one example");
            foreach ($this->example as $value) {
                echo("<li>" . $value . "</li>");
            }
            echo("</ul>");
        }
        function upload($conn){
          echo("Uploading...");
          $example = "<ul class='list-group'>";
          foreach ($this->example as $value) {
            if($value !== '')
              $example = $example . "<li class='list-group-item'>" . $value . '</li>';
          }
          $example = $example . "</ul>";

          $idiom = mysqli_real_escape_string($conn, $this->idiom);
          $meaning = mysqli_real_escape_string($conn, $this->meaning);
          $example = mysqli_real_escape_string($conn, $example);

          $query = 'INSERT INTO idioms (`idiom`, `meaning`, `example`) VALUES ("'. $idiom . '","'. $meaning . '","'. $example . '")';
          mysqli_query($conn, $query);
        }
    }

    $file = fopen("idioms.txt", "r") or die("Unable to open file!");
    $idiom_meaning_re     = '/^([A-za-z,. -\/()\']+):([A-za-z ,"().\';:\n]+)/';
    $multiline_meaning_re = '/^([A-za-z ,"().\';:]+)/';
    $example_re           = '/^\|--([A-Za-z0-9 \',?.-;$\n"]+[^:])/';
    $new_idiom = 0;
    $new_example = 0;
    $idiom = null;
    $example = "";
    // Output one line until end-of-file
    while(!feof($file)) {
      $line = fgets($file);
      //echo($line . "<br>");
      preg_match($idiom_meaning_re, $line, $matches);

      if($matches != null){
        if($new_example === 1){
          array_push($idiom->example, $example);
          $new_example = 0;
        }
        if($idiom != null){
          $idiom->print();
          $idiom->upload($conn);
        }
        $idiom = new Idiom;
        $idiom->example = [];
        $example = "";
        $new_idiom = 1;
        $idiom->idiom = trim($matches[1]);
        $idiom->meaning = trim($matches[2]);
      }else{
        preg_match($example_re, $line, $matches);
        if($matches != null){
           if($example !== "")
              array_push($idiom->example, $example);
           $new_idiom = 0;
           $new_example = 1;
           $example = trim($matches[1]);
        }else{
          preg_match($multiline_meaning_re, $line, $matches);
          //var_dump($matches);
          if($matches != null && $new_idiom){
            $idiom->meaning = $idiom->meaning . ' ' . trim($matches[1]);
            //echo($idiom->meaning);
          } elseif ($matches != null && $new_example) {
            $example = $example . ' ' . trim($matches[1]);
          } else {
            $new_idiom   = 0;
            $new_example = 0;
          }
        }
      }
    }
    echo("Closing file");
    fclose($file);

    mysqli_close($conn);

?>

You can find further information about the preg_match() in the PHP official documentation.