Blog – Welcome

Category: Blog

Localization of an Android game – the Chinese Chess

Posted on December 15, 2018March 6, 2019 by admin

Our project is to localize an android game into English, Japanese, and German by using the Android Studio. Isabella Sun is together with me in this project. The game is a traditional Chinese Board Game – Xiangqi, and it is also often called as “The Chinese Chess”. The history background of this board game is based on Chu–Han Contention, the years of battles before the establishment of Han Dynasty. The game represents a battle between two armies – Chu and Han. A snapshot of the game is shown below.

The rules of Chinese Chess and Chess share some common points. For example, the moving rule of horses are the same as that of knights in Chess, except for one thing – In Chinese Chess, horses cannot jump if they have another piece (no matter whom it belongs to) located one point horizontally or vertically adjacent to it. Comprehensive rules of Chinese Chess can be found here

Localizing this game is not as complicated as it might seem to be. Localization of a software or game is different from translation of it, some adaptations, either for the cultural reason or linguistic reason, might be necessary. After thinking that non-Chinese users are not familiar with the rules, I decided to add a new feature to this project – the illegal move warning function. Whenever the user makes an illegal move, a pop-up message will show up, saying that it is a wrong move, and tell the user the moving rule of that piece. Here is how it works. First, I found the java file which was responsible for the game logic; rules of the game were declared here. Then I created a variable, which will equal to a number that corresponds to a specific piece. For example, when the user makes a wrong move with the piece Pawn, the number equals to 1, while he/she makes a wrong move with the piece Rook, the number will be 2. By passing this variable to another file which is responsible for the game functions, I was able to recall this variable and created a switch statement, that decided which error message (of course, you have to add the texts of the messages to the strings files first) would show up based on the variable.

Another adaptation we made is the XML customization. XML is a mark-up language at which many operation rules and information of the main program can be saved. As we know, the text length of a sentence that is written in German is usually longer than the same sentence but is written in Chinese. Because Chinese is more condensed than most of the Roman languages, I modified the xml game settings page and adjusted the font-size, so that the Chinese characters will appear a little bit larger than other languages. While for German message and text boxes needed to be enlarged otherwise texts cannot be fully displayed. In localization industry, in case of truncation and text overlap, it is always a good practice to stay sensitive about text expansions. Therefore, thoroughly testing the entire game running process is essential.

We test the game by playing against the AI. However, the game AI is super smart, or at least it seems to be too smart to someone like me who only has the beginner level of ability to play Chinese Chess. In order to win the game and test what would happen after winning, I went to an online Chinese Chess playing platform that allows users play against each other online, then found someone who had a high ranking and played against him. I chose to let him move first. After he made a move, I would go back to the Android Emulator and used the same move. And after the AI responded me, I would go back to the webpage and used the same move as the AI just used to respond the player. By this “indirect match” between him and the AI, I was finally able to win the game. It’s worthy, because I found an audio file which would only be played after victory, while I did not know it at all before testing and winning the game.

As a conclusion, this is a successful project. Of course, new features will only be added with clients’ approval, so in this project, we assumed that we got the approval. As a good result, we successfully localized this android board game and managed to make some adaptations.

TAPICC – IMUG review

Posted on December 7, 2018March 6, 2019 by admin

Content Management System (CMS), such as WordPress, Drupal, Joomla,SharePoint, etc. are widely used today. In most cases, they are used to create and maintain websites, both for personal use by individuals and for commercial use by companies. When people discuss about website localization, a common question is “how to localize a website?”. If a company wants to localize its website into another language, how do they translate the materials from the website that they are managing? The answer is: using API integration, given by Mr. Jim Compton from RWS Moravia.

Mr. Jim Compton gave us a lecture about TAPICC during theIMUG events at Adobe Head Quarter. TAPICC refers to the Translation API Classand Cases, where API is the abbreviation of Application program interface.
TAPICC is managed by the Globalization and Localization Association (GALA), and involves a number of technology companies, Language Service Providers (LSP). According to GALA, TAPICC is “a collaborative, community-driven, open-source project to advance API standards for multilingual content delivery.” In 1990s, there was a big gap between Content Management System and Translation Management System (TMS),there were no conformed standards between these two technologies, and there were no systems that act as a bridge to connect these two together, thus, an integration, system to system communication system to facilitate the automation between content management and translation management is desired.

The initiatives of TAPICC is separated into 4 tracks. The first track is to support business metadata for supply chain automation. TAPICC is working on harmonization and compatibility, and this is to some extent the core track of TAPICC, as it wants to connect different management systems and platforms, making it suitable for various models. It has also defined several task types, including localization, translation, internationalization, DTP, and so on. The payload of each task is also specified in the system.

The second track of TAPICC is about unit-level exchange. The difference between track two and track one is that the first track is about exchanging contents between different organizations, while the second track is exchanging contents between different platforms, such as TMS or translation editors. In addition to it, track two is real time synchronous. It supports transferring data from a TMS to another TMS, or Machine translation system to TMS, and vice versa.

The third track is about semantic enrichment of unit. Although it hasn’t started yet, its goal is to enrich translation memories (TM), term bases and machine translations. The fourth track is to support visual contexts.Desktop Publishing is a common single step in Localization industry. The texts given to the translators are not always in an editable text formats, it can be visual contexts in videos or images. This track aims to make TAPICC compatible with these formats.

Benefits: Although TAPICC is still under developing, the benefits of applying TAPICC in the future is very visible. TAPICC solves, or will solve the problems that contents are hard to exchange over different systems. It facilitates many processes in localization and makes them ready for automation since TAPICC removes the incompatibility among these systems, and therefore a more standardized, conformed localization workflow can be established. TAPICC not only boosts the working efficiency, but also brings more consistencies by employing this integrated tool.

MIMUG review

Posted on December 7, 2018March 6, 2019 by admin

One of the most common challenges for Localization Engineers, Linguists, DTP specialists and many other people in this industry is the word character. Some letters of the target languages may sometimes not appear,or not properly shown due to the different character system between the source language and target languages. Mr. Tex Texin, from Xen Craft, which is a consulting firm specializing in software globalization, gave us a comprehensive and detailed introduction of Unicode, computer character system – such as ASCII, and how computers process the texts.

To understanding encoding, we need to first understand how computers record the information that we input by pressing the keyboard. Every time we press the keyboard, a row number and a column number will be generated, and each character has a corresponding value, just like a coordinate system. To make the system support the language, we first create a character set with letter,digits, punctuation and also symbols, and each one is assigned a specific value. ASCII and Unicode are some examples; ASCII defines 128 symbols, and Unicode defines even more than ASCII.

However, different countries might use different character sets because of the differences in both language and culture. If a person types a character that the Unicode value of it is not found in another character set,or does not match the same Unicode value of the character that he is typing, an error, typically mojibake, is appeared. Non-Roman languages, such as Chinese,Japanese, Korean and Russian are likely to encounter such problems as the letters they are using are so different from the Roman languages.

The mojibake issue reminds me of one of my experiences of handling the characters. I was in a team project to train Statistical machine translation (SMT). The goal was to improve the translation quality of machine translation by “feeding” bilingual and monolingual texts to the machine, tuning and testing the machine by providing with sample translations. The BLEU score,that reflected the translation quality, was extremely low at first; later, we found that the problem was caused by the garbled texts in our corpus files.However, the original files that was uploaded to the machine were free of such errors. We then asked about how she prepared such files, she said the files were downloaded as an SRT file (the text format that is widely used for subtitles),and then transformed to TMX files. But she also mentioned that for some of the files, they were first copied and pasted on Microsoft Word. We then realized that the different encoding systems using for the machine translation system and in Microsoft Word is very likely to be the reason that cause mojibake, and it was! Even though the texts we pasted on Word appeared normally, it doesn’t mean that when we output the file, every character retains the same encoding value.

In my opinion, always check the encoding of the text editor we are using when output a file is a good habit for localizers. For example,when we save a txt file, actually there is a little dropdown menu next to the “Save”button. If someone types Chinese characters and saves the file as ANSI encoding, mojibake may occur if he tries to open this same file in Microsoft Word with a UTF-8 encoding. Therefor, it is a good practice to keep using same encoding format, if possible, to save or open a file.

Mr. Texin also mentioned about text direction – some languages,for example Arabic, write from right to the left, and some languages even use bidirectional texts. Character System is an interesting topic in Localization industry. It is not a super technical problem, but it requires us to remain sensitive to such issues.

Video Localization Project

Posted on May 20, 2018March 6, 2019 by Weixin Mo

Subtitling and Localizing a video

Introduction:

My Desktop publishing (DTP) final project in the 2018 Spring semester is subtitling and localizing a Chinese commercial advertisement video by using Adobe After Effects. It is a collection of several advertisements produced by the same web game manufacturer. These ads are some of the best-known ads in China, especially by the young. They are famous mostly for their hilarious conversations, awkward pronunciation of Chinese, and garish graphics. They have been accumulating popularity at an amazing speed in China and produced some famous internet memes as well. The source video is in Chinese, but the original subtitle is at very low quality, incomplete, and the font styles are inconsistent, as shown in the screenshots below. Continue reading “Video Localization Project” →

CAT Final Project Portfolio

Posted on May 16, 2018October 27, 2018 by Weixin Mo

Introduction:

This Statistical Machine Translation(SMT) Project is to train the Microsoft Translation Hub, which is the SMT provided by Microsoft, to assist human translation of the subtitles of Marvel superhero movies. By the end of our training, we have completed 22 rounds of training in total, with 129,869 segments added into the machine. The updated proposal aims to present the analysis of the pilot project, the estimates of the complete project, and recommendations for how to adopt machine translation in your company’s translation workflow. Continue reading “CAT Final Project Portfolio” →

Translation Management System Project

Posted on May 15, 2018October 27, 2018 by Weixin Mo

Our project is to translate a commercial document for Clarins, which is a worldwide known luxury cosmetic manufacturer. We use World Server as our Translation Management System (TMS), and we also run a round of pseudo translation in World Server as a test run. The source file is a brochure, We localize the original file from English to Chinese, then do the Desktop publishing to makes a perfect Chinese version of the brochure. The deliverables include:

Source file
Target file
Pseudo Translation document
Term base
Translation memory

Our proposal can be downloaded here
The deliverable can be downloaded here
Our final presentation can be downloaded here
A blog post with detailed introduction of selection of Translation Management System(TMS) is here
I also make an infographic regarding the TMS comparison, it can be downloaded here

TMS Selection and Recommendation

Posted on April 16, 2018October 27, 2018 by Weixin Mo

Why do we need a TMS?

A Translation Management System (TMS) can do the basic and repetitious works that may be tedious for human workers. Generally, TMS has its own automation system to facilitate a project. Project Managers can apply the workflows that are provided by TMS, so that they do not need to spend time on making new workflows, and hence boosts the working efficiency. TMS can also reduce human errors since many tasks, such as sending notification, can be done by its automation system.

TMS Recommendation:

I would recommend Lingotek as a wonderful model of Translation Management System. Firstly, Lingotek has a very user-friendly interface. The interface is clean and straightforward, with the tool bars located on the left side of the screen. The system is high automated since most of the resources can be synchronized with the projects. Lingotek has a very useful search panel as well; it will not take so much time for project managers to search for their resources.

Secondly, Lingotek gives the users much freedom to manage projects. It allows the users to customize the workflows themselves. Users can customize, add, or delete phases in a project workflow based on their needs. In addition to it, they can also make adjustments for each of the phase, so the entire process is flexible and highly automated. Lingotek also allows users to do translation on its embedded workbench. Translators can have real time editions on the workbench, which is very convenient.

In addition to what I mentioned above, users can also create and manage linguistic tools on Lingotek. For example, users can create Translation memories and Term base in the browser, can these data set can also be downloaded and further imported to CAT tools, such as SDL Trados.

Finally, Lingotek is cloud based. It is an online system, thus the users do not have to install a new application on their desktops. The entire system is easy to access, while people may have to pay for buying license if they want to use some other TMSs.

Experience with XTRF

Posted on March 14, 2018October 27, 2018 by Weixin Mo

The needs for translation and localization have never been so strong. As a fast-increasing number of enterprises and companies are seeking for expanding their sales scope oversea, translation and localization begin to play vital roles. However, translation has no longer been a simple process——one people gives you the source file and then you give him back the translated file. This is not in a translation class, in which all you need to do is translating the document that your instructor gave you and then turned in. Instead, translator is a part of the vendor team, and plays a role in a complicated workflow. Clients, sales person, project manager, language provider and many other roles work together in a project, thus translation management systems are necessary to automate the workflow and reduce the manual processes which may be tedious or time consuming.

XTRF is a TMS that manages the flow of globalization and localization processes. The fundamental tasks throughout the translation are automated. It also has different portals for people in different positions, and they are simultaneously connected so that people, such as PM and vendors do not have to meet together to get things done. For me, XTRF is a convenient tool to have a clear overview of the whole workflow, and I will talk about my experience of using it from several aspects.

Interface:

The interface is very user friendly, as shown on the picture.

It is user friendly because roles and task are clearly categorized, so that it does not take much effort for the project managers to tell where to click.

Man Management:

The project manager can view, add, and edit the clients’ information, assign the contact persons by clicking the button “clients”. Clients can be added here or clicking the “Add” tab on the top of the interface.

The data of the vendors are pretty detail. Other than the personal information, such as contact info and language combinations, CAT tools, software and hardware using are also shown, which is very convenient for the project managers, helping them to decide which vendor can be contacted to complete a certain type of project.

In addition to those above, XTRF also has a wonderful email management. Emails can be sent automatically, thus the project managers do not need to manually send notifications to all of the people involved. This feature will be even more important if a Project manager is in charge of a complicated project, for example, to translate the source file into 30 languages, and more than one vendors may be needed.

In short, XTRF knows the workflow well, so it manages to send notification whenever an important further action is needed; for example, asking for approval of quotations, notifying the translators to start, telling the proofreaders to start reviewing, and so forth.

(The list of vendors)

Workflow:

XTRF is a highly automated system which frees the PMs from doing much tedious works. For example, after the creation of a project, a quote can be automatically compiled and sent, with almost all the information that the clients may use to approve a quote. In addition to it, PMs can supervise and see the progress of the current project, without having to worry about losing control.

Another fantastic feature is that XTRF allows the project managers to customize the workflow base on their own preference; and it can also be adjusted during the project, hence users do not have to start from the beginning.

Compatibility:

XTRF supports a large number of file formats. Commonly use file formats, such as TMX and XLIFF are supported. However, XTRF is not a CAT tool, so it is not a platform for translation.

Conclusion:

As a conclusion, XTRF is one of the most fascinating Translation Management Systems. If you are a project manager, XTRF can not only enhance your working efficiency with its highly automated system, but also provide the users high quality management of the localization process. If you are a vendor or a client, this TMS can also prevent you from wasting much time on looking for contact persons, unnecessary communication, and thus boosts your time efficiency.

Website Localization Project

Posted on March 8, 2018October 27, 2018 by Weixin Mo

Our Project is to localize a real website by using transifex, a proxy translation platform. To avoid any copy rights issues, we decided to choose the website of a non-profit organization, and the final translated website will not be published. In this project, we use Plant with purpose as our source website. Transifex requires the users to choose the type of the projects, the files-based project is for localizers who have the website pages on files, and the Live Project option is for those who want to localize the website in real time through JavaScript. Transifex allows a team to administrate a project, which is convenient for a team to collaborate.

After adding the website URL to transifex, the system automatically generated a report, however, the word count report was incomplete because we had not yet make scans. Transifex counts detected strings only, and the word count report will be updated every time when the system detects new words or strings; moreover, the words showed on the report will be counted as using the “quota”, thus the users do not have to worry about wasting money on translating the words that they do not want to translate. This is different from Easyling (Easyling will count every word found in the first scan, so it is very hard to not exceed the word limit).

Translation memories can be imported and exported. This function will be particularly important if the translators need to convert the source language to tens of target languages. Each language will possess a unique report.

The most fascinating function in transifex is the live translation, allowing the user to apply a live preview of the translated website. When going into the live translation mode, the system will automatically run a scan through the current page, determining the strings to be translated. After the scan, users can approve the detected strings or exclude some of the strings, while approved strings will be ready for translation. All the translation jobs can be done in the editor. In this case, we used Google translation to machine translate all the approved strings. Marking the translated strings as “Reviewed” is all a user needs to do to finish the translation tasks. The live translation editor can be regarded as a small CAT tools, since we are doing the same thing as we will do in SDL Trados. No coding is needed, thus, even a translator without any relevant knowledge in programming can finish this on his own.

As mentioned before, transifex only run scans on current page, which means that following every link in the whole website and translating it separately is necessary. Links cannot be directly followed like what we do in a real website. Users need to click the button “follow the link” because a single click on the link is to edit the text, not to follow it. All of the translated pages can be chosen and reviewed by following the commands in the right tools bar.

The final job needed to be done is to publish the translated website. Transifex provides the project administrators with the JavaScript codes to show the translation. The project admins need to copy the codes and paste them onto every page that needed to be localized. Then, transifex also provide a staging server, which is staging.transifex.com. By using the staging server, localizers are able to test the project without publishing to the public.

Overall, transifex is a powerful and user-friendly proxy translation tools. It is easy to use, and the whole system is highly automated. It is definitely a very good choice for localization companies. However, its drawbacks are also apparent. The first one is the price. Transifex is much more expensive than its rivals, as its cheapest subscription is 179 dollar/month, with only 50K hosted words available. If a company does not have lots of website localization business, transifex is likely to be a bad choice. Second, although more than one user can localize the website at the same time, transifex sometimes crashes if two workers doing the jobs at the same time, especially if they are working on the translation for different pages, the auto detection may produce some bugs.

Crowdsourcing Translation —— Quality control

Posted on March 8, 2018October 27, 2018 by Weixin Mo

Some common worries about crowdsourcing may deter people’s interests. A typical one is that how do companies maintain their quality-control (QA) for crowdsourcing translation.

The first practice is to hire LSPs to complete the final reviews of translations, like what Facebook did for their website localization into hundreds of languages. It is a fairly direct way to keep high-quality translation, and do not take much effort for the companies of reviewing, as they hand over the tasks to language providers. Another merit is, professional language providers can help the companies to catch up the deadlines.
An alternative way to maintain QA is to employ a voting system; It is expected that translation with the highest votes will be more likely the better translation. Many translation management systems allow the project managers to choose voting as the way to accept translation. Typically, project managers can decide for how many votes that a translation will be automatically accepted by the system.
For companies which want to localize their products to hundreds of languages, a difficult challenge will be that it is sometimes hard to find enough translators, not to speak of professional reviewers. In such cases, project managers can apply the round robin test——letting each of the competitors to grade all other people, in order to decide the potential best translator.