SMOLNET PORTAL home about changes

IRI in Gemini


Currently (january 2021), the specification seems silent about IRI (Internationalized Resource Identifiers, RFC 3987). It just says "<URL> is a UTF-8 encoded absolute URL" which is absurd (URI must be in US-ASCII). Handling IRI would require more than that, as well as practical advices for software authors.

Issue #1 in the specification work (https://gitlab.com)
Gemini current specification (gemini.circumlunar.space)
RFC 3986 on URI syntax
RFC 3987 on IRI syntax
RFC 5890 on IDN (domain names in Unicode)


What should programs do?


It is not clear what servers and clients should do (send an IRI, or accept IRI but convert it to URI or something else). A test with some clients seem to indicate it does not always work.

Testing server, with an IDN in the name (e with accent). (gémeaux.bortzmeyer.org)
Testing server, an IRI with an IDN and a non-ASCII character in the path (gémeaux.bortzmeyer.org)


The server at the end is (january 2021) a Gemserv. The domain name was configured in Punycode ('hostname = "xn--gmeaux-bva.bortzmeyer.org"' in config.toml).

The Gemserv server (https://sr.ht)


Currently (january 2021):

  • Amfora claims the domain name does not exist (it does exist), "Failed to connect to the server: dial tcp: lookup gĂ©meaux.bortzmeyer.org: no such host."
  • AV-98 does not protest and sends the IRI but the server I use does not understand it with the above setup
  • Bombadillo says "Found "Ă©", expected EOF"
  • Lagrange now works (before that, it said "Failed to communicate with the host. Here is the error message: Failed to look up hostname")


Proposals


Accept IRI as first-class citizens


This is more natural for a new protocol, free of HTTP legacy. Limit Punycode to the minimum (the current state of the domain name tree requires Punycode for DNS lookups).

  • parse the IRI and extract the domain name
  • convert it to Punycode
  • do the DNS lookup
  • connect to the IP address and send the IRI as request


Many software libraries already do so automatically.

Remaining issues:

  • certificates (Let's Encrypt will put Punycode in the certificate)
  • Unicode normalization. What if the client sends NFC and the server is configured with a name in NFD? RFC 5198 says NFC.


RFC 5198 on a canonical Internet form of Unicode


Use Punycode and percent-encoding for everything


Another proposal is to convert all IDNs to Punycode before putting them on the wire, whether in DNS traffic or in Gemini traffic. In that case, the server is configured with a Punycode. Same thing for the path in the URI, use percent-encoding (café → caf%C3%A9). This is how the test server above is configured and it works with Lagrange and Agunua.

Lagrange (https://github.com)
Agunua


Do nothing


This page would become illegal, with its IRI. In this proposal, gemtext (text/gemini files) would have to use US-ASCII URI only.


Solderpunk's summary of the three proposals (https://lists.orbitalfox.eu)

His #1 solution is my "Do nothing", his #2 is "Use Punycode and percent-encoding for everything" and his #3 is "Accept IRI as first-class citizens".

RFC 3492 On Punycode


RFC 8399, IDN in certificates


The Gemini specification (gemini.circumlunar.space)


[Web] The issue in the Go-gemini library (https://github.com)
Response: 20 (Success), text/gemini
Original URLgemini://gemini.bortzmeyer.org/gemini/iri.gmi
Status Code20 (Success)
Content-Typetext/gemini; charset=utf-8; lang=en