Skip to main content

Parser in PHP using regular expressions

You can use regular expressions in PHP with the function preg_match ( string $pattern , string $subject

\[, array &$matches \[, int $flags = 0 \[, int $offset = 0 \]

]] ) . Only the first two paremeters are mandatory and they are the regex and the string where you want to search respectively.

In case of finding a result, preg_match() returns an array where the item at index 0 is the whole match. From 1 onwards they are placed the different groups of your regular expressions (in case there is any). If no match is found, preg_match() returns null.

One of the details that must be taken into account when using regular expressions on PHP is that they must be enclosed by forward slashes (/), e.g. $multiline_meaning_re = ‘/^(

\[A-za-z ,"().\\';:\]

+)/’; . This regular expression matches any string with any set of letters, spaces, commas, double and single quotes, parenthesis points, colon and/or semicolon.

As a complete example, the following snippet opens a file, parses it to look for English idioms and uploads all of them a MySQL database.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
<?php
$servername = "";
$database = "";
$username = "";
$password = "";
$conn = mysqli_connect($servername, $username, $password, $database);

// Check connection
if (!$conn) {
    die("Connection failed: " . mysqli_connect_error());
}

    class Idiom {
        var $idiom = "";
        var $meaning = "";
        var $example = [];
        function print(){
            echo($this->idiom . "<br>" . $this->meaning . "<ul>");
            print_r($this->example);
            if(count($this->example) == 0 )
              echo("ERROR!!!!. There should be at least one example");
            foreach ($this->example as $value) {
                echo("<li>" . $value . "</li>");
            }
            echo("</ul>");
        }
        function upload($conn){
          echo("Uploading...");
          $example = "<ul class='list-group'>";
          foreach ($this->example as $value) {
            if($value !== '')
              $example = $example . "<li class='list-group-item'>" . $value . '</li>';
          }
          $example = $example . "</ul>";

          $idiom = mysqli_real_escape_string($conn, $this->idiom);
          $meaning = mysqli_real_escape_string($conn, $this->meaning);
          $example = mysqli_real_escape_string($conn, $example);

          $query = 'INSERT INTO idioms (`idiom`, `meaning`, `example`) VALUES ("'. $idiom . '","'. $meaning . '","'. $example . '")';
          mysqli_query($conn, $query);
        }
    }

    $file = fopen("idioms.txt", "r") or die("Unable to open file!");
    $idiom_meaning_re     = '/^([A-za-z,. -\/()\']+):([A-za-z ,"().\';:\n]+)/';
    $multiline_meaning_re = '/^([A-za-z ,"().\';:]+)/';
    $example_re           = '/^\|--([A-Za-z0-9 \',?.-;$\n"]+[^:])/';
    $new_idiom = 0;
    $new_example = 0;
    $idiom = null;
    $example = "";
    // Output one line until end-of-file
    while(!feof($file)) {
      $line = fgets($file);
      //echo($line . "<br>");
      preg_match($idiom_meaning_re, $line, $matches);

      if($matches != null){
        if($new_example === 1){
          array_push($idiom->example, $example);
          $new_example = 0;
        }
        if($idiom != null){
          $idiom->print();
          $idiom->upload($conn);
        }
        $idiom = new Idiom;
        $idiom->example = [];
        $example = "";
        $new_idiom = 1;
        $idiom->idiom = trim($matches[1]);
        $idiom->meaning = trim($matches[2]);
      }else{
        preg_match($example_re, $line, $matches);
        if($matches != null){
           if($example !== "")
              array_push($idiom->example, $example);
           $new_idiom = 0;
           $new_example = 1;
           $example = trim($matches[1]);
        }else{
          preg_match($multiline_meaning_re, $line, $matches);
          //var_dump($matches);
          if($matches != null && $new_idiom){
            $idiom->meaning = $idiom->meaning . ' ' . trim($matches[1]);
            //echo($idiom->meaning);
          } elseif ($matches != null && $new_example) {
            $example = $example . ' ' . trim($matches[1]);
          } else {
            $new_idiom   = 0;
            $new_example = 0;
          }
        }
      }
    }
    echo("Closing file");
    fclose($file);

    mysqli_close($conn);

?>

You can find further information about the preg_match() in the PHP official documentation.