Month: December 2018

Localization of an Android game – the Chinese Chess

Localization of an Android game – the Chinese Chess

Our project is to localize an android game into English, Japanese, and German by using the Android Studio. Isabella Sun is together with me in this project. The game is a traditional Chinese Board Game – Xiangqi, and it is also often called as “The Chinese Chess”. The history background of this board game is based on Chu–Han Contention, the years of battles before the establishment of Han Dynasty. The game represents a battle between two armies – Chu and Han. A snapshot of the game is shown below.

The rules of Chinese Chess and Chess share some common points. For example, the moving rule of horses are the same as that of knights in Chess, except for one thing – In Chinese Chess, horses cannot jump if they have another piece (no matter whom it belongs to) located one point horizontally or vertically adjacent to it. Comprehensive rules of Chinese Chess can be found here

Localizing this game is not as complicated as it might seem to be. Localization of a software or game is different from translation of it, some adaptations, either for the cultural reason or linguistic reason, might be necessary. After thinking that non-Chinese users are not familiar with the rules, I decided to add a new feature to this project – the illegal move warning function. Whenever the user makes an illegal move, a pop-up message will show up, saying that it is a wrong move, and tell the user the moving rule of that piece. Here is how it works. First, I found the java file which was responsible for the game logic; rules of the game were declared here. Then I created a variable, which will equal to a number that corresponds to a specific piece. For example, when the user makes a wrong move with the piece Pawn, the number equals to 1, while he/she makes a wrong move with the piece Rook, the number will be 2. By passing this variable to another file which is responsible for the game functions, I was able to recall this variable and created a switch statement, that decided which error message (of course, you have to add the texts of the messages to the strings files first) would show up based on the variable.

Another adaptation we made is the XML customization. XML is a mark-up language at which many operation rules and information of the main program can be saved. As we know, the text length of a sentence that is written in German is usually longer than the same sentence but is written in Chinese. Because Chinese is more condensed than most of the Roman languages, I modified the xml game settings page and adjusted the font-size, so that the Chinese characters will appear a little bit larger than other languages. While for German message and text boxes needed to be enlarged otherwise texts cannot be fully displayed. In localization industry, in case of truncation and text overlap, it is always a good practice to stay sensitive about text expansions. Therefore, thoroughly testing the entire game running process is essential.

We test the game by playing against the AI. However, the game AI is super smart, or at least it seems to be too smart to someone like me who only has the beginner level of ability to play Chinese Chess. In order to win the game and test what would happen after winning, I went to an online Chinese Chess playing platform that allows users play against each other online, then found someone who had a high ranking and played against him. I chose to let him move first. After he made a move, I would go back to the Android Emulator and used the same move. And after the AI responded me, I would go back to the webpage and used the same move as the AI just used to respond the player. By this “indirect match” between him and the AI, I was finally able to win the game. It’s worthy, because I found an audio file which would only be played after victory, while I did not know it at all before testing and winning the game.

As a conclusion, this is a successful project. Of course, new features will only be added with clients’ approval, so in this project, we assumed that we got the approval. As a good result, we successfully localized this android board game and managed to make some adaptations.

TAPICC – IMUG review

TAPICC – IMUG review

Content Management System (CMS), such as WordPress, Drupal, Joomla,SharePoint, etc. are widely used today. In most cases, they are used to create and maintain websites, both for personal use by individuals and for commercial use by companies. When people discuss about website localization, a common question is “how to localize a website?”. If a company wants to localize its website into another language, how do they translate the materials from the website that they are managing? The answer is: using API integration, given by Mr. Jim Compton from RWS Moravia.

Mr. Jim Compton gave us a lecture about TAPICC during theIMUG events at Adobe Head Quarter. TAPICC refers to the Translation API Classand Cases, where API is the abbreviation of Application program interface.
TAPICC is managed by the Globalization and Localization Association (GALA), and involves a number of technology companies, Language Service Providers (LSP). According to GALA, TAPICC is “a collaborative, community-driven, open-source project to advance API standards for multilingual content delivery.” In 1990s, there was a big gap between Content Management System and Translation Management System (TMS),there were no conformed standards between these two technologies, and there were no systems that act as a bridge to connect these two together, thus, an integration, system to system communication system to facilitate the automation between content management and translation management is desired.

The initiatives of TAPICC is separated into 4 tracks. The first track is to support business metadata for supply chain automation. TAPICC is working on harmonization and compatibility, and this is to some extent the core track of TAPICC, as it wants to connect different management systems and platforms, making it suitable for various models. It has also defined several task types, including localization, translation, internationalization, DTP, and so on.  The payload of each task is also specified in the system.

The second track of TAPICC is about unit-level exchange. The difference between track two and track one is that the first track is about exchanging contents between different organizations, while the second track is exchanging contents between different platforms, such as TMS or translation editors. In addition to it, track two is real time synchronous. It supports transferring data from a TMS to another TMS, or Machine translation system to TMS, and vice versa.

The third track is about semantic enrichment of unit. Although it hasn’t started yet, its goal is to enrich translation memories (TM), term bases and machine translations. The fourth track is to support visual contexts.Desktop Publishing is a common single step in Localization industry. The texts given to the translators are not always in an editable text formats, it can be visual contexts in videos or images. This track aims to make TAPICC compatible with these formats.

Benefits: Although TAPICC is still under developing, the benefits of applying TAPICC  in the future is very visible. TAPICC solves, or will solve the problems that contents are hard to exchange over different systems. It facilitates many processes in localization and makes them ready for automation since TAPICC removes the incompatibility among these systems, and therefore a more standardized, conformed localization workflow can be established. TAPICC not only boosts the working efficiency, but also brings more consistencies by employing this integrated tool.

MIMUG review

MIMUG review

One of the most common challenges for Localization Engineers, Linguists, DTP specialists and many other people in this industry is the word character. Some letters of the target languages may sometimes not appear,or not properly shown due to the different character system between the source language and target languages. Mr. Tex Texin, from Xen Craft, which is a consulting firm specializing in software globalization, gave us a comprehensive and detailed introduction of Unicode, computer character system – such as ASCII, and how computers process the texts.

To understanding encoding, we need to first understand how computers record the information that we input by pressing the keyboard. Every time we press the keyboard, a row number and a column number will be generated, and each character has a corresponding value, just like a coordinate system. To make the system support the language, we first create a character set with letter,digits, punctuation and also symbols, and each one is assigned a specific value. ASCII and Unicode are some examples; ASCII defines 128 symbols, and Unicode defines even more than ASCII.

However, different countries might use different character sets because of the differences in both language and culture. If a person types a character that the Unicode value of it is not found in another character set,or does not match the same Unicode value of the character that he is typing, an error, typically mojibake, is appeared. Non-Roman languages, such as Chinese,Japanese, Korean and Russian are likely to encounter such problems as the letters they are using are so different from the Roman languages.

The mojibake issue reminds me of one of my experiences of handling the characters. I was in a team project to train Statistical machine translation (SMT). The goal was to improve the translation quality of machine translation by “feeding” bilingual and monolingual texts to the machine, tuning and testing the machine by providing with sample translations. The BLEU score,that reflected the translation quality, was extremely low at first; later, we found that the problem was caused by the garbled texts in our corpus files.However, the original files that was uploaded to the machine were free of such errors. We then asked about how she prepared such files, she said the files were downloaded as an SRT file (the text format that is widely used for subtitles),and then transformed to TMX files. But she also mentioned that for some of the files, they were first copied and pasted on Microsoft Word. We then realized that the different encoding systems using for the machine translation system and in Microsoft Word is very likely to be the reason that cause mojibake, and it was! Even though the texts we pasted on Word appeared normally, it doesn’t mean that when we output the file, every character retains the same encoding value.

In my opinion, always check the encoding of the text editor we are using when output a file is a good habit for localizers. For example,when we save a txt file, actually there is a little dropdown menu next to the “Save”button. If someone types Chinese characters and saves the file as ANSI encoding, mojibake may occur if he tries to open this same file in Microsoft Word with a UTF-8 encoding. Therefor, it is a good practice to keep using same encoding format, if possible, to save or open a file.

Mr. Texin also mentioned about text direction – some languages,for example Arabic, write from right to the left, and some languages even use bidirectional texts. Character System is an interesting topic in Localization industry. It is not a super technical problem, but it requires us to remain sensitive to such issues.