php.mo – convert gettext .po files to binary .mo files in PHP, without using Poedit

→ Skip to: php.mo on Github

When making translatable/internationalised websites (a process known as i18n, short for ‘internationalization’), many people and software projects use the gettext library, which in PHP often looks like this: <?php _('My Cool Text');?>.

This code, used instead of a standard echo or print, will look for a translated version of the input string (in the .mo file specified at the top of the application/software) and output it, or if it hasn’t been translated, will output the original English text by default – which is nice.

Basically, this means that if Mr. Webmaster hasn’t translated the string “whatsup, dickface” into Italiano, the visitor will just see that text, instead of an ugly untranslated “welcome_msg” text, or a total blank.

.po

So there are many methods to make the translation catalogue files (called .po files) which list every piece of English text and its representation in the foreign language, along with helpful hints to translators. The most widely used program is Poedit.

Poedit has some serious quirks and could definitely use a UI upgrade (1998 called… they want their interface back) – but it works, and when you save a project, it not only generates the .po file (which can be opened and edited in the future), but also, a binary ‘machine object’ (.mo) file, which although unreadable to us, is the much faster of the two for a script to run from.

But if you’re using a custom build script to generate .po files from, say, a Google Documents spreadsheet with all the translations for every language in it (as on my latest project – it’s fairly easy as it happens), you need to also generate .mo files (not so easy, unless you speak binary.)

.mo

So I needed a way to generate binary .mo objects from the ready-converted .po files, so that gettext could translate the site.

I found this handy command that seemed to work great:

$ msgfmt -cv -o /path/to/output.mo /path/to/input.po

But, in order for this to run, you need the gettext library installed (which may well be installed by default,) and, I think, probably a few other things. And even though it went great in the terminal, I couldn’t get it to run inside exec() – and I can’t vouch for the configuration of the system on which the convertor script is going to be run in the future.

A little more googling led me to a piece of code from 2007, called php-msgfmt, which is a command-line PHP script. I couldn’t get the thing to work reliably form the command line or from PHP, but I did steal the .mo file-converting functions – which seem to be absolutely rock solid (credit given!)

Enter php.mo

php.mo is a couple of functions that take an input (.po) file and generate an output (.mo) file. Usage is easy:

<?php
    require('php-mo.php');
    phpmo_convert( 'input.po', 'output.mo' );
?>

Just like that. You can even leave the second parameter blank, and the output file will take the same filename as the input file, with a .mo extension.

I’ve double-checked the phpmo-generated .mo binary files with those generated by Poedit, and they seem to be identical – which leads me to believe that this is production-ready.

The code is on Github, for all your .po to .mo needs – enjoy!

PS. Make sure to hit ‘Watch’ on the repository if you’re interested in gettext, translations and internationalisation – I’m planning to release a bunch more PHP scripts into the php.mo project for various other useful gettext-ey functions.

26 thoughts on “php.mo – convert gettext .po files to binary .mo files in PHP, without using Poedit

  1. Silvana Donato

    I tried poedit in the past, but it was not working well (at least to me); instead this method is very simple and quick to use (the one with php-mo.php).

    Thank you very much, it worked very well!

    Reply
  2. Esteban Eid

    Hello Joss!

    Great lib!! I’m using it right now, but I’m not getting a valid .mo file.
    First thing I see, the size is different (smaller, 4Kb) then the poEdit .mo file (12kb)
    Second thing, I’m not able to instantiate this .mo file, translations are not showing, but I’m not getting any error.

    I’m thinking that there’s maybe a problem with charset? I’m using UTF-8 on the .po file. I’ve tried converting the .po file to latin-1 but same result on the conversion.

    Any hint?

    I think this is a fantastic lib, how can I help you in the debug?

    Thanks a lot!

    Reply
    1. Esteban Eid

      Don’t worry just figured out, the problem was that my po file it’s a very simple one, like this one:

      msgid “Username”
      msgstr “Nombre de usuario”
      msgid “Password”
      msgstr “Contraseña”
      msgid “Login”
      msgstr “Login”

      Then I noticed on line 92 this code “case ‘#|'” where the key/data is added to the hash, I don’t have that char on my po file, so was never added, I’ve copied the same code from the case to line 139, and now it’s working greaaat!

      Thanks a lot for this lib!

      Best regards,

      Esteban

      Reply
      1. Joss Post author

        Hey Esteban,

        Thanks for posting and glad you got it sorted out! Let me know if there’s anything else I can help with, or if you have any suggestions/feedback.

        Reply
  3. Pingback: Converter arquivos .po no formato gettext para arquivo binário .mo com php | Felipe Marques

  4. Rommsteinz

    Hey, after many hours of googling, I finally found someone who did a very great job for me.

    Thanks a lot, it works perfectly by here. Keep up good work !

    Reply
  5. Gero

    Dude, this is awesome! I’ve been struggling with Poedit for a long time, both on Windows and Mac. It’s simply feels alien inside my workflow, but automatizing this with this great lib is just two thumbs up.

    … However, I’d love to see the lib report at least some error messages in case it’s necessary. Poedit often enough gave be the finger over nothing, and debugging was a pain – but at least I knew when something was wrong. I couldn’t tickle your lib to give me just any kind of output. As far as I can see, the only thing I get is a “true – all clear” or a “false – go check for yourself what went wrong”, right?

    Reply
    1. Joss Post author

      Interesting feedback and thanks for the comment.

      I haven’t worked with this for a while, so I’d love it if you could share any solution you come up with! Feel free to do a pull request on github too.

      Reply
  6. DougW

    Hey there, we’re writing a similar utility, and I noticed a couple small things in your code that could be problematic.

    It appears to explode on spaces, which means that different whitespace might be parsed incorrectly.

    The eval is statement is probably not a concern if you trust your POs, but since I’ve heard of people building this into wordpress plugins etc, it’s probably an unnecessary risk that should be worked around. Certain strings could execute nearly anything.

    Cheers!

    Reply
    1. Joss Post author

      Hey, these sound like great catches. Are you able to make any changes or edits and submit them to the git repository? I’m pretty busy with other things right now!

      Reply
      1. DougW

        No problem. I forked it, made a few changes and filed a pull request back to your project. Please double check me and pull if so inclined.

        Cheers!

        Reply
  7. Julian Fricker

    I’ve used the code but the mo file doesn’t seem right. If I try to check the file using msgunfmt I get lots of errors about “invalid multibyte sequence”. I downloaded the msgfmt.php this code is based on and the outcome was the same.

    I’ve tested my po files using the standard msgfmt and it works fine and msgunfmt is happy with it.

    Any ideas on how I can check what’s going wrong? My po is in French UTF-8 with lots of french characters, like this:


    #: smarty.c:2
    msgid "Ski Property"
    msgstr "Propriétés Ski"

    Reply
  8. Ruben

    I just went and had my own go at creating something for transforming po files to mo files and the reverse and published my results on github. Unfortunately I had some issues with incorrect output files. Also I didn’t really like the eval in the code and created a (admittedly horribly looking function) to escape and unescape strings instead of using eval. Have a look at https://github.com/rnijveld/pgt

    Reply
  9. Omer Sabic

    Hello mate,
    how do you generate your .po file? I found some tips but maybe you got another useful lib for that. :)

    Thanks for publishing this.

    Reply
  10. Pingback: php.mo | prosoxi.com

  11. Tajid M.

    Have you tried this online tool poeditor.com ? It doesn’t need downloading and your project, if imported in .po format, can be exported in .mo, after doing the necessary changes or translations.

    Reply
  12. Dave Goodchild

    Hi,
    This is a great piece of work and has been very useful as we didn’t want to have to download the po’s from our online translation service to compile using Poedit. Thank-you for sharing.

    I did however have to make a few changes. The code (as of 29/01/2013) does not handle “#~” and so simply bombs out (return false mid-parse-loop). Secondly if a string is not translated the string is compiled as empty/nothing and the interface once loaded using that .mo has basically no text. I’ve made the necessary adjustment so that those are not compiled and that it then falls back to the text within _(‘…’) function call.

    Would you be interested in these fixes, if so, shall I make a fork+pull request or would you like me to email you my changes?

    Thanks again,

    Dave.

    Reply
  13. Yossi

    In Poedit, empty msgstr strings (“”) in the .po file are converted to .mo assuming the msgid values.
    In phpmo_convert they are left blank, which is a real problem when you have a program that has several partly translated languages, where non translated items should get the default (=msgid) value.
    Is it possible to alter the script to have an option to make phpmo_convert fill the empty msgstr instances with the msgid value?

    Reply
  14. David

    Hi all,

    First of all, congratulations to Joss for this great Job. I’m developing a translation app to my enterprise to manage .po files and I found this amazing library to do the job.
    Like Yossi said, empty msgstr should not be translated to empty strings, but the msgid value, to leave the default value.
    I’ve modified the code to take this case, is a simple modification in the switch statement that saves the last msgid value, and if the msgstr is empty, sets the value from the last msgid. I’ve put here the code just in case it can help anyone.


    switch ($key) {
    case '#,' : // flag...
    $fuzzy = in_array('fuzzy', preg_split('/,\s*/', $data));
    case '#' : // translator-comments
    case '#.' : // extracted-comments
    case '#:' : // reference...
    case '#|' : // msgid previous-untranslated-string
    // start a new entry
    if (sizeof($temp) && array_key_exists('msgid', $temp) && array_key_exists('msgstr', $temp)) {
    if (!$fuzzy)
    $hash[] = $temp;
    $temp = array ();
    $state = null;
    $fuzzy = false;
    }
    break;
    case 'msgid' :
    // untranslated-string
    $last_msgid = $data;
    case 'msgctxt' :
    // context
    case 'msgid_plural' :
    // untranslated-string-plural
    $state = $key;
    $temp[$state] = $data;
    break;
    case 'msgstr' :
    // translated-string
    $state = 'msgstr';
    if(trim($data) == "\"\""){
    $temp[$state][] = $last_msgid;
    } else
    $temp[$state][] = $data;
    break;
    default :
    if (strpos($key, 'msgstr[') !== FALSE) {
    // translated-string-case-n
    $state = 'msgstr';
    $temp[$state][] = $data;
    } else {
    // continued lines
    switch ($state) {
    case 'msgctxt' :
    case 'msgid' :
    case 'msgid_plural' :
    $temp[$state] .= "\n" . $line;
    break;
    case 'msgstr' :
    $temp[$state][sizeof($temp[$state]) - 1] .= "\n" . $line;
    break;
    default :
    // parse error
    fclose($fh);
    return FALSE;
    }
    }
    break;
    }

    It’s only the switch clause, and the case modifieds are the ones corresponding to msgid and msgstr cases. I’m not tested this in a real environment, but in a few test made locally it works well.

    Best Regards,

    David

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>