Video Game Localization: How to Find Out User Difficulty and Expectations

Video gaming is fast becoming the most popular recreational past-time globally. With the gaming market reportedly worth $138 billion (USD) in 2018, it has been predicted to increase at a compound annual growth rate of 10%, reaching $180 billion (USD) by 2021.

Therefore, more users around the world want popular games to be translated and adapted to their regions. Reciprocally, there’s a rapid increase in localization, a subsector of the gaming industry. Its goal is to create a smoother playing experience for the end user by taking into account their specific cultural context, while being faithful to the source material.

The problem many game localization companies face, however, is the challenging task of analyzing user reviews for useful insights and to inform the localization process. This costs the company time and money, and delays users from fully enjoying their chosen game.

This article will demonstrate how such analysis can be done efficiently, by looking at a project carried out for Allcorrect as a part of the Data Analyst Practicum by Yandex bootcamp this year. Specifically, examples of using some popular Python packages for dealing with text data in multiple languages are presented, such as deep_translator, lang_detect, and NLTK.

The data

The data used for the report in this article is provided by Allcorrect. It includes game ID, user score and review text in its original language. The games are on both mobile and PC platforms, which have different user scoring systems. The game IDs have been anonymized.

To prepare for the analysis, reviews that mention localization need to be identified. In this project, there are a total of 140 keywords in 27 languages used to filter the data. One of the challenges in this process is the differences among languages. Users might mention the word 'localization' in their reviews, but use more than one expression in their original language. Therefore, having only one translation for 'localization' in the keywords will miss the reviews containing other expressions.

Using the Chinese language for example, the keywords Chinese, localization and English can be translated as 中文,本土化,英语 respectively. However, in the user reviews, we also find the use of 华语,汉化,英文 to refer to the same concepts. These need to be included in the keywords as not to miss important localization reviews. The same would apply to other languages as well.

Using the keywords to filter the data, slightly over 1% of it was about localization (a total of 152,447 records). In what follows, we will demonstrate how to carry out language detection, translation, and emoji conversion on text data. Then, we'll discuss how to use a logistic regression model to reveal the dynamics of positive and negative reviews. Finally, we'll examine the most common reasons for negative reviews. (Where there are code examples, the variable reviews is used to refer to the localization review dataset.)

Useful packages, tools and functions

Before diving into the details of the analysis, let's briefly introduce the most useful Python packages that were used in the analysis of large texts in multiple languages. This section will list the packages used, as well as some useful functions that readers could modify and use.

1) langdetect is a package used for language detection. To apply functions to large dataset, swifter is a useful package that will speed up the pandas processing speed. (For more information, click here ) Below you'll find a demonstration of how the two work together.)

# install the packages 
!pip install langdetect
!pip install swifter
# import the packages
from langdetect import detect
from langdetect import DetectorFactory
import swifter
# to avoid unstable results everytime the language detection function is run
DetectorFactory.seed = 0
# define a function that can be applied to the text column 
def detect_language(x):
        language = detect(x)
        language = 'Other'
    return language
# an example of applying the function using swifter on a dataframe named "reviews"
reviews['language'] = reviews['text'].swifter.apply(detect_language)

2) GoogleTranslator, from the Python deep_translator package, is a useful tool to translate text reviews. Given that GoogleTranslator does not work when the text exceeds a certain length, when defining the function to apply to dataframe, a try-except would come in handy. (More info on the Python's deep_translator document can be found, here .) Below we demonstrate how to use this package to define a function to translate text data.

# install and import the package
!pip install -U deep_translator
from deep_translator import GoogleTranslator
# define the function to translate the text column
def translate(x):
     translation = GoogleTranslator(source='auto', target='en').translate(x)
     return translation
    return "not translated"

3) Emojis are used often in review text data. As these reflect users' feelings, and at times, we may want to keep them. One way to retain such data for text analysis is to convert them into text. This is where the Python package emoji comes in handy. The code below demonstrates the process of using this package to define a function to convert the emojis to text data.

# install and import the package
!pip install emoji
import emoji
from emoji import UNICODE_EMOJI
# define a series of functions 
# first a function that returns True or False the text is an emoji
def is_emoji(s):
    return s in UNICODE_EMOJI
# define a function that will add space around each emoji
def add_space(text):
    return ''.join(' ' + char if is_emoji(char) else char for char in text).strip()
# convert emoji to text
def convert_emojis(text):
       return emoji.demojize(text)
      return text 
# an example of applying the above functions 
reviews['text'] = reviews['text'].apply(add_space)
reviews['text'] = reviews['text'].apply(convert_emojis)

The tools described above are just some of the ones that proved useful when preparing the dataset for further analysis. With those in place, let's move on to discuss the analysis.

Top 10 requested languages and games

Applying the language detection function to the review text column allows us to find out the most popular languages in which the localization reviews are written.

The graph above shows the top 10 languages with the most localization reviews: simplified Chinese, Korean, Russian, Turkish, English, Spanish, Italian, French, Thai, and Japanese. Examination of the text data further reveals the prevalence of user requests of local versions or improvement of existing local versions of games in these languages. Such information provides useful insights as to which languages the game localization companies should focus on in their strategic development plans.

We also generated the top 10 games (in terms of the number of their localization reviews) below.

The graph above shows the games the company should focus on when developing localization plans.

Further analysis of reviews for a particular game should reveal the user's preferences, such as whether the game should be localized in a particular language or that existing localization versions should be improved.

User sentiments in localization reviews

To distinguish the user sentiment (positive or negative feelings) in the reviews, reviews were assigned as negative (score 1 or 2 for mobile games and 0 for pc), neutral (score 3), and positive (4 or 5 for mobile games and 1 for pc). A breakdown of review sentiments is shown below.

Among all the localization reviews, over 65% were positive, approximately 23% were negative, and 11% remained neutral. It's good news that the majority of the reviews mentioning localization are positive. Our focus here is to understand the dynamics of positive and negative reviews, particularly reasons for negative reviews. To that end, the neutral reviews have been removed and 10% of the total data has been sliced and translated, which is used to build a logistic regression model. The resulting dataframe has the following structure:

['game_id' 'text' 'score' 'language' 'positive' 'english']

Examining positive and negative reviews

Having translated the reviews into English and then converted emojis into text, the positive and negative review texts were used to generate wordclouds to highlight any patterns.(Python has a wordcloud libary and here is a demonstration of how to use it.) First, a wordcloud for positive reviews.

Words such as 'great', 'good', 'fun', 'better' stood out, but words like 'difficult' and 'problem' were also present. Expressions like 'language support', 'turkish language', and 'chinese' indicate the need for specific languages.

Next, let's see the negative reviews wordcloud.

Again we see words like 'good' stand out. Words such as 'problem', 'bad', 'need', and "can't" are likely to indicate issues faced by users.

Despite the above observation, there are no distinct differences in patterns between positive and negative reviews using wordclouds alone. Hence why a logistic model was built to reveal underlying dynamics.

A logistic regression model to detect positive or negative review patterns

Using a logistic regression model makes interpretation easy and can be learned quickly. It also performs well for the sparse matrix of vectorized words.

The goal of fitting a logistic model here is not to predict the sentiment of reviews, however. It is believed that the score given by the user reflects their satisfaction level more accurately, whereas the role of their text reviews is to provide additional information or explanation. Therefore, the main goal is to reveal the underlying text patterns.

Before using the review text column and the binary positive column to build a simple logistic regression model, the text review column needs to be processed. The function used to retrieve the text data is shown below.

The key Python packages needed for this function are provided by the NLTK library: stopwords, word_tokenize, and WordNetLemmatizer:

# define a function to further process the reviews text in english
def process(text):
  #tokenize the text
  tokens = word_tokenize(text)
  # convert to lower case
  tokens = [w.lower() for w in tokens]
  # remove punctuation from each word
  table = str.maketrans('', '', string.punctuation)
  stripped = [w.translate(table) for w in tokens]
  # remove remaining tokens that are not alphabetic
  words = [word for word in stripped if word.isalpha()]
  # filter out stop words
  stop_words = set(stopwords.words('english'))
  words = [w for w in words if not w in stop_words]
  #apply lemmatization 
  words = [lemmatizer.lemmatize(word) for word in words]
  text = " ".join(words)
  return text
# then we get a function that can be applied to the text column.

After processing the text column, the first five rows of the data used for the logistic regression model look like this:

   english                                          positive
0    game top thumbsup thumbsup clappinghands clapp...    1
1    indonesian facewithrollingeyes    0
2    love find hidden object game like word must co...    1
3    nt italian    1
4    problem change language mother since english u...    1

Let's now move onto the steps of fitting a logistic model to the data.

The data frame is split into train and test sets. 80% of the data will be used for training, and 20% will be used for testing. Alternatively, the train_test_split method can be used to split the data, but here let's try another way:

# random split train and test data to 80% and 20%
index = df.index
df['random_number'] = np.random.randn(len(index))
train = df[df['random_number'] <= 0.8]
test = df[df['random_number'] > 0.8]

To perform logistic regression analysis on text data, the text first needs to be tokenized and vectorized using a tool from the Scikit-learn library, CounterVectorizer.

# count vectorizer:
vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train['english'])
test_matrix = vectorizer.transform(test['english'])

# create an instance of the class LogisticRegression 
lr = LogisticRegression()
# split target and independant variables 
X_train = train_matrix
X_test = test_matrix
y_train = train['positive']
y_test = test['positive']

Two hyperparameters are fine tuned in the model: penalty and C values. GridSearchCV is used to find the optimal values for them.

# Let's find out with what hyperparameter give the best model results.
penalty = ['l1', 'l2', 'elasticnet', 'none']
c_values = [100, 10, 1.0, 0.1, 0.01]
grid = dict(penalty=penalty,C=c_values)
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
grid_search = GridSearchCV(estimator=lr, param_grid=grid, n_jobs=-1, cv=cv, scoring='accuracy',error_score=0)
grid_result =, y_train)

# print the best hyperparameter values
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
Best: 0.726493 using {'C': 0.1, 'penalty': 'l2'}

It looks like at the default level C=0.1, penalty at 'l2', the best results are achieved. Those will be used as hyperparameters.

# fit the model on data
lr = LogisticRegression(C=0.1, penalty='l2', random_state=0)
final_model =,y_train)
# make predictions
predictions = final_model.predict(X_test)

The result is a successfully built and trained simple logistic regression model. Checking the accuracy score for both training and test data shows that the model has an overall 78% prediction accuracy on the training data, and 73% accuracy on the testing data. Or model is slightly overfitted.

Discriminating words for positive and negative reviews

Using the logistic regression model, words or patterns that feature in the positive or negative reviews can be revealed. Let's look at the 10 most discriminating words for both types of reviews. This is done by looking at the largest and smallest coefficients, respectively.

print("Top 10 discriminating words for positive reviews:")
feature_to_coef = {
    word: coef for word, coef in zip(
        vectorizer.get_feature_names(), final_model.coef_[0]
for best_positive in sorted(
    key=lambda x: x[1], 
print ("\nTop 10 discriminating words for negative reviews:")    
best_negative_list = []
for best_negative in sorted(
    key=lambda x: x[1])[:10]:
    print (best_negative)

Top 10 discriminating words for positive reviews:
('excellent', 1.2318797837182207)
('perfect', 1.0922819266631894)
('cool', 1.0645717900879892)
('nice', 1.004451706604957)
('great', 0.9375519484038363)
('fun', 0.9196472963620572)
('hope', 0.890851739223927)
('best', 0.7966138811023504)
('super', 0.775542710090151)
('wonderful', 0.7670213758020292)
Top 10 discriminating words for negative reviews:
('rubbish', -0.8713436190047102)
('uninstall', -0.8652553696990095)
('terrible', -0.8452426468565963)
('uninstalled', -0.8315787570336696)
('refund', -0.7895618598247778)
('shit', -0.7562280034676745)
('junk', -0.7239565503012998)
('poor', -0.7209821924340266)
('garbage', -0.6243783857945131)
('delete', -0.6128080797008089)

The 10 most discriminating negative words have been saved for when we examine the negative reviews.

Examine the negative reviews

The most important piece of information that the company wants to know is the reasons behind the negative reviews (involving localization). Therefore, all the negative localization review text data has been translated into English for closer examination. The key method used for this purpose is word concordances, a combination of keywords and discriminating negative words from the logistic regression model.

The negative review text column has been processed in the same way as the logistic model. A column has also been added to show the number of words in the reviews.

Using this data, it is possible to look at the distribution of review length, the top languages in which those negative reviews were written, as well as which games they refer to, before moving on to the negative review characteristics.

Distribution of length of reviews

There are a total of 31,631 negative reviews about localization in the dataset. Below are two histograms showing the distribution of length in the number of words. The first shows the total number, and the second is with the outliers removed.

From the above histogram, we can determine that most negative reviews are under 50 words in length. We will use this information to filter the reviews when examining their patterns using the method of word concordances.

Top languages and games for negative reviews

Will the top 10 languages for negative reviews differ from those for all reviews? Let's take a look.

As the graph above demonstrates, the top 10 languages dominating the negative reviews are Simplified Chinese, Russian, Korean, Turkish, English, Italian, Spanish, French, Japanese, and German. The difference between this and the previous ranking for all reviews is the German language.

Next, let's see what 10 games have the most negative reviews about localization.

The above shows the 10 games that have the most negative reviews that mention localization.

Although not all these reviews are negative exactly because of their localization quality - the filtering of localization reviews wasn't perfect - this data is still useful for localization companies to conduct further investigations into those games.

Examining the negative reviews using word concordances

To examine the text reviews using word concordances, we need to first create a corpus for the review text, instatiate an NLTK text object using the corpus, and then call the concordance method. Below shows a code example carrying this procedure.

# First, define a function to get corpus out of a text column
def corpus(column):
  words_list = []
  for text in column:
      words = word_tokenize(text)
      words_list += words
      words = word_tokenize(str(text))
      words_list += words
  return words_list

# build the corpus and create an NLTK text object
from nltk.corpus import stopwords
corpus = corpus(df['english'].str.lower())
text = nltk.Text(corpus)
# find the first 20 lines of matches that have 'language'
text.concordance("language", lines=20)

Using this method, concordances for the following words have been generated:

  • language
  • translation
  • localization/localization
  • problem
  • difficult
  • rubbish
  • terrible
  • [censored]
  • grammar

These are a mixture of the localization key words, what we observed from the wordclouds, as well as the discriminating negative words generated from the logistic regression model. The list can certainly expand depending on the information being sought.

For each concordance, we have generated 20 matching lines for further examination. Below is an example of such concordances matching the word 'language'.

Displaying 20 of 11812 matches:
sorship and newtering all kinds of language in the game chat , and other such 
then returned to the whole foreign language , do n't you hurry up and fix it b
 and fix the problem . the italian language is missing uninstalled if there is
uninstalled if there is no turkish language , how do i say it ? i did not unde
 russian ? why is there no russian language ? please translate it into korean 
ithout chinese there is no russian language in the game . don ’ t talk about t
uld also know that by changing the language in the launcher , you can get diff
ame 's website ) , put the english language and get goodies . forget about cam
e only \ `` experts \ '' . turkish language support should be added immediatel
ing why do n't you have vietnamese language ? good game without vietnamese is 
tack screens are garbled , and the language becomes incomplete , which is very
thing if you please add the arabic language игра теряет рентабельность , попул
very big minus there is no russian language could you please put the language 
 language could you please put the language of molière thank you . there is a 
e it . because there is no russian language ! everything is in english . i kno
 a little bit . please add russian language : folded_hands : . i don ’ t care 
wnloading the game , i went to the language patch , clicked several yellow lin
xed than this . when is the polish language ? ? ? ? ? ? ? i give two because i
in french and we ca n't change the language otherwise it 's cool at the beginn
iterally saying words in their own language that can mean colors of \ '' slow\

Examination of the concordances for all the selected keywords shows that, athough in most reviews related to localization there is the mention of the lack of the users' native language in the games, an additional urgency can be detected.

Чтобы изменить содержимое ячейки, дважды нажмите на нее (или выберите "Ввод")

These can be shown in, for example:

  • "When is Polish language???????" with many questions marks to emphasize
  • the '!" after "there is no Russian language!"
  • please add/put language of ...
  • "Turkish support should be added immediately"
  • "don't your hurry up and fix it"

The analysis also reveals the difficulty users experience when playing games without local language support, for example:

  • "without Vietamaese it is too difficult to play"
  • "difficult to understand because it is not in..."
  • "difficult to understand instructions"
  • "the method of Japanese mod is difficult"
  • "Thai language... difficult ... to be able to play"

Users also noted the issue with localization quality, for example:

  • "translation is rubbish"
  • 'terrible translation"
  • "The Chinese localization is rubbish, even worse than google translation"
  • 'weird grammar"
  • "can not play because of missing grammar"

and potential cultural conflict, for example

  • "discriminate against Chinese"

Furthermore, given the concern that not all reviews that mention localization are negative because of a specific localization problem, it is important to find out what negative comments are being made about localization. Therefore, the following function might be useful:

# define a simple function to calculate how many rows in the review column contains particular word or expression
def cal_keyword(keyword, column):
  count = 0
  for value in column: 
    if value.find(keyword) != -1:
      count +=1
  return count 

The function can of course be applied to find the frequency of any expressions, such as 'language sucks', 'bad grammar', or 'poor translation', to make the search more specific. The function and its modifications will allow the analyst to quickly locate reviews of particular patterns and perform further analysis on them.

For example, applying the function using 'bad translation' reveals there are 48 matching records. A possible next step would be to find out which these are, what games, and what languages.

# First let's check out those that contain language 
cal_keyword('bad translation', df['english'])


The analysis discussed in this article reveals the most popular languages among the reviews, as well as which games received the most reviews mentioning localization. These would provide useful information for the localization company on what to focus in their strategic planning.

An investigation of the positive and negative reviews translated in English shows that they exhibit similar structures in language use, both noting the lack of games in a specific language and the difficulty that causes gamers to experience.

A notable difference in the positive reviews is that the lack of a local language version may not have impacted the enjoyment of the game. For negative reviews, having no local language version or a bad translation add to the barriers of game play, even arousing a feeling of discrimination in some cases, and so a hint of detected urgency and frustration.

To sum up, the top reasons for negative sentiments are :

  • lack of specific language
  • bad/poor/terrible translation
  • difficult in understanding or playing in English
  • repeated requests to add language not addressed
  • discrimination/stereotypes
  • bad/awkward localization
  • wrong, bad, or weird grammar and punctuation use

The tools and functions described in this article allow the company to quickly distinguish what languages and games they should pay particular attention to in their localization plans. An easy to apply method has also been provided, which allows the company to examine the reasons for negative reviews. Using the procedure demonstrated, it is also possible for the localization company to futher examine which languages and games the problems persist in, such as poor translation or even stereotypes, and take action accordingly.


Ready to hustle?

Jumpstart your new tech career by becoming a Practicum student.
Apply now