Microsoft Azure Translator API call in R
DATE: 2021-03-01
AUTHOR: John L. Godlee
A colleague was having trouble constructing an API call in R to call the Microsoft Azure Translator. They had lots of household survey responses in Portuguese that they wanted to translate to English for analysis. There are some examples[1] on the Microsoft Azure documentation of how to call the API using C#, Go, Java, Node.js and Python, but nothing for R. There are some R packages for using Azure Translator that already exist:
- {translateR}[2] - Google and Microsoft API access
- {AzureCognitive}[3] - larger scope than just translation. Officially endorsed?
- {mstranslator}[4] - abandoned?
but as Azure Translator uses a conventional RESTful API, it's also possible just to use the {httr} package[5].
3: https://github.com/Azure/AzureCognitive (https://github.com)
4: https://github.com/chainsawriot/mstranslator (https://github.com)
5: https://github.com/r-lib/httr (https://github.com)
Using Azure Translator requires setting up an Azure account in order to access an API key. More documentation on that here[6]. As of 2021-02-30 there is a free tier for Azure Translator which offers up to 2 million characters of translation for free per month, with a few other features.
In R, first, load packages and create some text to translate, two sentences, in English and Portuguese:
# Packages library(httr) # Create some example Portuguese (used Google Translate) engl <- "This is some test text. The big train had black smoke." port <- "Este é um texto de teste. O grande trem tinha fumaça preta." engl2 <- "Cardboard boxes are easy to flatten" port2 <- "Caixas de papelão são fáceis de achatar"
Then, define keys, endpoints and parameters for the API call. The key, endpoint and location can be retrieved from your Azure portal.
key <- "XXX" endp <- "https://api.cognitive.microsofttranslator.com" location <- "global" path <- "translate" apiv <- "3.0" to_lang <- "en"
Create the headers:
heads <- c( "Ocp-Apim-Subscription-Key" = key, "Ocp-Apim-Subscription-Region" = location, "Content-type" = "application/json" )
This is the bit that took me a bit of trial and error to figure out, using nested lists to create a JSON-like query that can then be converted to JSON for the API query. The Azure documentation states that API queries should follow this structure:
[
{
"Text" : "Hello, what is your name?"
},
{
"Text" : "My name is John"
}
]
So in R, thats a list, containing two other named lists (named "Text"), each containing a single character string, the string to translate. In R:
input <- list(port, port2)
input_list <- lapply(input, function(x) {
list("Text" = x)
})
Construct the query using httr::POST():
result <- POST(
endp,
path = path,
query = list(
`api-version` = apiv,
to = to_lang
),
body = input_list,
encode = "json",
add_headers(.headers = heads)
)
The result is returned as a JSON string, so R needs to parse it to return a similarly nested list structure:
[
{
"detectedLanguage" : {
"language" : "pt",
"score" : 1.0
},
"translations" : [
{
"text" : "This is a test text. The big train had black smoke.",
"to" : "en"
}
]
},
{
"detectedLanguage" : {
"language" : "pt",
"score" : 1.0
},
"translations" : [
{
"text" : "Cardboard boxes are easy to flatten",
"to" : "en"
}
]
}
]
result_parse <- content(result, as = "parsed")
[1] [1] [1] 1 [1] 1 [1] [1]]$translations[[1] [1]]$translations[[1] 1 [1]]$translations[[1] 1 [2] [2] [2] 1 [2] 1 [2] [2]]$translations[[1] [2]]$translations[[1] 1 [2]]$translations[[1] 1
Then it's trivial to convert it to whatever data structure you want, in my case I want a dataframe:
result_df <- do.call(rbind, lapply(result_parse, function(x) {
data.frame(
from_lang_det = x$detectedLanguage$language,
from_lang_score = x$detectedLanguage$score,
to_lang = x$translations[[1]]$to,
trans = x$translations[[1]]$text
)
}))
┌───────────────┬─────────────────┬─────────┬─────────────────────────┐ │ from_lang_det │ from_lang_score │ to_lang │ trans │ ╞═══════════════╪═════════════════╪═════════╪═════════════════════════╡ │ pt │ 1 │ en │ This is a test text ... │ ├───────────────┼─────────────────┼─────────┼─────────────────────────┤ │ pt │ 1 │ en │ Cardboard boxes are ... │ └───────────────┴─────────────────┴─────────┴─────────────────────────┘
Response: 20 (Success), text/gemini
| Original URL | gemini://republic.circumlunar.space/users/johngodlee/post... |
|---|---|
| Status Code | 20 (Success) |
| Content-Type | text/gemini; charset=utf-8 |