iarticle-seirdy-an-experiment-to-test-github-copilot-s-legality.mw - tgtimes - The Gopher Times Err bitreich.org 70 hgit clone git://bitreich.org/tgtimes git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/tgtimes URL:git://bitreich.org/tgtimes git://enlrupgkhuxnvlhsf6lc3fziv5h2hhfrinws65d7roiv6bfj7d652fid.onion/tgtimes bitreich.org 70 1Log /scm/tgtimes/log.gph bitreich.org 70 1Files /scm/tgtimes/files.gph bitreich.org 70 1Refs /scm/tgtimes/refs.gph bitreich.org 70 1Tags /scm/tgtimes/tag bitreich.org 70 1README /scm/tgtimes/file/README.md.gph bitreich.org 70 i--- Err bitreich.org 70 iarticle-seirdy-an-experiment-to-test-github-copilot-s-legality.mw (11221B) Err bitreich.org 70 i--- Err bitreich.org 70 i 1 .SH seirdy Err bitreich.org 70 i 2 An experiment to test GitHub Copilot's legality Err bitreich.org 70 i 3 .2C 157v Err bitreich.org 70 i 4 . Err bitreich.org 70 i 5 .QP Err bitreich.org 70 i 6 This article was posted on 2022-07-01 by Rohan Kumar Err bitreich.org 70 i 7 .FS Err bitreich.org 70 i 8 https://seirdy.one/posts/2022/07/01/experiment-copilot-legality/ Err bitreich.org 70 i 9 gemini://seirdy.one/posts/2022/07/01/experiment-copilot-legality/index.gmi Err bitreich.org 70 i 10 .FE Err bitreich.org 70 i 11 and is now republished on this newspaper, with permission (CC-BY-SA 4.0). Err bitreich.org 70 i 12 . Err bitreich.org 70 i 13 . Err bitreich.org 70 i 14 .IP "Preface" Err bitreich.org 70 i 15 . Err bitreich.org 70 i 16 .PP Err bitreich.org 70 i 17 I am not a lawyer. Err bitreich.org 70 i 18 This post is satirical commentary on: Err bitreich.org 70 i 19 . Err bitreich.org 70 i 20 .IP \(bu Err bitreich.org 70 i 21 The absurdity of Microsoft and OpenAI's legal justification for GitHub Copilot. Err bitreich.org 70 i 22 . Err bitreich.org 70 i 23 .IP \(bu Err bitreich.org 70 i 24 The oversimplifications people use to argue against GitHub Copilot (I don't like it when people agree with me for the wrong reasons). Err bitreich.org 70 i 25 . Err bitreich.org 70 i 26 .IP \(bu Err bitreich.org 70 i 27 The relationship between capital and legal outcomes. Err bitreich.org 70 i 28 . Err bitreich.org 70 i 29 .IP \(bu Err bitreich.org 70 i 30 How civil cases seem like sporting events where people “win” or “lose”, rather than opportunities to improve our understanding of law. Err bitreich.org 70 i 31 . Err bitreich.org 70 i 32 .PP Err bitreich.org 70 i 33 In the process, I intentionally misrepresent how the judicial system works: Err bitreich.org 70 i 34 I portray the system the way people like to imagine it works. Err bitreich.org 70 i 35 Please don't make any important legal decisions based on anything I say. Err bitreich.org 70 i 36 . Err bitreich.org 70 i 37 .PP Err bitreich.org 70 i 38 The only section you should take seriously is “Context: Err bitreich.org 70 i 39 the relevant technologies”. Err bitreich.org 70 i 40 . Err bitreich.org 70 i 41 . Err bitreich.org 70 i 42 .IP "Introduction" Err bitreich.org 70 i 43 . Err bitreich.org 70 i 44 .PP Err bitreich.org 70 i 45 GitHub is enabling copyleft violation \fBat scale\fR with Copilot. Err bitreich.org 70 i 46 GitHub Copilot encourages people to make derivative works of source code without complying with the original code's license. Err bitreich.org 70 i 47 This facilitates the creation of permissively-licensed or proprietary derivatives of copyleft code. Err bitreich.org 70 i 48 . Err bitreich.org 70 i 49 .PP Err bitreich.org 70 i 50 Unfortunately, challenging Microsoft (GitHub's parent company) in court is a bad idea: Err bitreich.org 70 i 51 their legal budget probably ensures their victory, and they likely already have a comprehensive defense planned. Err bitreich.org 70 i 52 How can we determine Copilot's legality on a level playing field? We can create legal precedent that they haven't had a chance to study yet! Err bitreich.org 70 i 53 . Err bitreich.org 70 i 54 .PP Err bitreich.org 70 i 55 A chat with Matt Campbell about a speech synthesizer gave me a horrible idea. Err bitreich.org 70 i 56 I think I know a way to find out if GitHub Copilot is legal: Err bitreich.org 70 i 57 we could use its legal justification against another software project with a smaller legal budget. Err bitreich.org 70 i 58 Specifically, against a speech synthesizer. Err bitreich.org 70 i 59 The outcome of our actions could set a legal precedent to determine the legality of Copilot. Err bitreich.org 70 i 60 . Err bitreich.org 70 i 61 .PP Err bitreich.org 70 i 62 Context: the relevant technologies Err bitreich.org 70 i 63 Let's cover the technologies and actors at play before I start my evil monologue. Err bitreich.org 70 i 64 . Err bitreich.org 70 i 65 . Err bitreich.org 70 i 66 .IP "Exhibit A: GitHub Copilot" Err bitreich.org 70 i 67 . Err bitreich.org 70 i 68 .PP Err bitreich.org 70 i 69 GitHub Copilot is a predictive autocompletion service for writing software. Err bitreich.org 70 i 70 It's powered by OpenAI Codex, Err bitreich.org 70 i 71 .FS Err bitreich.org 70 i 72 https://openai.com/blog/openai-codex/ Err bitreich.org 70 i 73 .FE Err bitreich.org 70 i 74 a language model based on GPT-3. Err bitreich.org 70 i 75 .FS Err bitreich.org 70 i 76 https://en.wikipedia.org/wiki/GPT-3 Err bitreich.org 70 i 77 .FE Err bitreich.org 70 i 78 It was trained using the source code of public repositories hosted on GitHub, regardless of their licensing. Err bitreich.org 70 i 79 In response to a Request for Comments from the US Patent and Trademark Office, OpenAI claimed that “Artificial Intelligence Innovation”, such as code written by GitHub Copilot, should be considered “fair use”. Err bitreich.org 70 i 80 .FS Err bitreich.org 70 i 81 See Comment Regarding Request for Comments on Intellectual Property Protection Err bitreich.org 70 i 82 for Artificial Intelligence Innovation submitted by OpenAI to the USPTO. Err bitreich.org 70 i 83 https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf Err bitreich.org 70 i 84 .FE Err bitreich.org 70 i 85 . Err bitreich.org 70 i 86 .PP Err bitreich.org 70 i 87 Many of the code snippets it suggests are exact copies of source code from various GitHub repositories. Err bitreich.org 70 i 88 For an example, see this tweet: Err bitreich.org 70 i 89 I don't want to say anything but that's not the right license Mr Copilot. Err bitreich.org 70 i 90 .FS Err bitreich.org 70 i 91 https://nitter.net/mitsuhiko/status/1410886329924194309 Err bitreich.org 70 i 92 https://twitter.com/mitsuhiko/status/1410886329924194309 Err bitreich.org 70 i 93 .FE Err bitreich.org 70 i 94 by Armin Ronacher Err bitreich.org 70 i 95 .FS Err bitreich.org 70 i 96 https://lucumr.pocoo.org/about/ Err bitreich.org 70 i 97 .FE Err bitreich.org 70 i 98 It contains a screen recording of Copilot suggesting this Quake code. Err bitreich.org 70 i 99 .FS Err bitreich.org 70 i 100 https://github.com/id-Software/Quake-III-Arena/blob/master/code/game/q_math.c Err bitreich.org 70 i 101 At line 552 Err bitreich.org 70 i 102 .FE Err bitreich.org 70 i 103 When prompted to do so, it obediently fills in a permissive license. Err bitreich.org 70 i 104 That permissive license violates the Quake code's GPL-2.0 license. Err bitreich.org 70 i 105 Copilot provides no indication that a license violation is taking place. Err bitreich.org 70 i 106 . Err bitreich.org 70 i 107 .PP Err bitreich.org 70 i 108 GitHub performed its own research into the matter. Err bitreich.org 70 i 109 .FS Err bitreich.org 70 i 110 I doubt anybody worth their salt would count on a company to hold itself Err bitreich.org 70 i 111 accountable, but at least they tried. Err bitreich.org 70 i 112 .FE Err bitreich.org 70 i 113 You can read about it on their blog: Err bitreich.org 70 i 114 GitHub Copilot research recitation, Err bitreich.org 70 i 115 .FS Err bitreich.org 70 i 116 https://github.blog/2021-06-30-github-copilot-research-recitation/ Err bitreich.org 70 i 117 .FE Err bitreich.org 70 i 118 by Albert Ziegler. Err bitreich.org 70 i 119 .FS Err bitreich.org 70 i 120 https://github.com/wunderalbert Err bitreich.org 70 i 121 .FE Err bitreich.org 70 i 122 I'm not convinced that it accounts for the fact that suggested code might have mechanical alterations to match surrounding text, while still remaining close enough to trained data to be a license violation. Err bitreich.org 70 i 123 . Err bitreich.org 70 i 124 . Err bitreich.org 70 i 125 .IP "Exhibit B: The Eloquence speech synthesizer" Err bitreich.org 70 i 126 . Err bitreich.org 70 i 127 .PP Err bitreich.org 70 i 128 I recently had a chat with Matt on IRC about screen readers and different types of speech synthesizers. Err bitreich.org 70 i 129 I mentioned that while I do like some variety, I always find myself returning to the underrated robotic voice of eSpeak NG. Err bitreich.org 70 i 130 .FS Err bitreich.org 70 i 131 https://github.com/espeak-ng/espeak-ng/ Err bitreich.org 70 i 132 .FE Err bitreich.org 70 i 133 He shared some of my fondness, and also shared his preference for a similar speech synthesizer called Eloquence. Err bitreich.org 70 i 134 . Err bitreich.org 70 i 135 .PP Err bitreich.org 70 i 136 Downloads of Eloquence are easy to find (it's even included with the JAWS screen reader), but I struggle to find any “official” pages about the original Eloquence. Err bitreich.org 70 i 137 Nuance acquired Eloquent Technology, the developer of Eloquence. Err bitreich.org 70 i 138 Microsoft later acquired Nuance. Err bitreich.org 70 i 139 . Err bitreich.org 70 i 140 . Err bitreich.org 70 i 141 .IP "Eloquence sample audio" Err bitreich.org 70 i 142 . Err bitreich.org 70 i 143 .PP Err bitreich.org 70 i 144 Matt recorded this sample audio clip of Eloquence reading some text. Err bitreich.org 70 i 145 .FS Err bitreich.org 70 i 146 https://seirdy.one/a/eloquence.mp3 Err bitreich.org 70 i 147 .FE Err bitreich.org 70 i 148 The text is from the introduction of Best practices for inclusive textual websites. Err bitreich.org 70 i 149 .FS Err bitreich.org 70 i 150 https://seirdy.one/posts/2020/11/23/website-best-practices/ Err bitreich.org 70 i 151 .FE Err bitreich.org 70 i 152 . Err bitreich.org 70 i 153 .QP Err bitreich.org 70 i 154 My primary focus is inclusive design. Err bitreich.org 70 i 155 Specifically, I focus on supporting underrepresented ways to read a page. Err bitreich.org 70 i 156 Not all users load a page in a common web-browser and navigate effortlessly with their eyes and hands. Err bitreich.org 70 i 157 Authors often neglect people who read through accessibility tools, tiny viewports, machine translators, “reading mode” implementations, the Tor network, printouts, hostile networks, and uncommon browsers, to name a few. Err bitreich.org 70 i 158 I list more niches in the conclusion. Err bitreich.org 70 i 159 Compatibility with so many niches sounds far more daunting than it really is: Err bitreich.org 70 i 160 if you only selectively override browser defaults and use plain-old, semantic HTML (POSH), you've done half of the work already. Err bitreich.org 70 i 161 . Err bitreich.org 70 i 162 .PP Err bitreich.org 70 i 163 I like the Eloquence speech synthesizer. Err bitreich.org 70 i 164 It sounds similar to the robotic yet predictable voice of my beloved eSpeak NG, but with improved overall quality. Err bitreich.org 70 i 165 Unfortunately, Eloquence is proprietary. Err bitreich.org 70 i 166 . Err bitreich.org 70 i 167 . Err bitreich.org 70 i 168 .IP "Exhibit C: Deep learning speech synthesis" Err bitreich.org 70 i 169 . Err bitreich.org 70 i 170 .PP Err bitreich.org 70 i 171 Deep learning speech synthesis Err bitreich.org 70 i 172 .FS Err bitreich.org 70 i 173 https://en.wikipedia.org/wiki/Deep_learning_speech_synthesis Err bitreich.org 70 i 174 .FE Err bitreich.org 70 i 175 is a recent approach to speech synthesizer creation. Err bitreich.org 70 i 176 It involves training a deep neural network on voice samples, and using the trained model to generate speech similar to a real human voice. Err bitreich.org 70 i 177 One synthesizer using deep learning speech synthesis is Mozilla's TTS. Err bitreich.org 70 i 178 .FS Err bitreich.org 70 i 179 https://github.com/mozilla/TTS Err bitreich.org 70 i 180 .FE Err bitreich.org 70 i 181 . Err bitreich.org 70 i 182 .PP Err bitreich.org 70 i 183 Zero-shot approaches could allow a pre-trained model to generate multiple different voices. Err bitreich.org 70 i 184 YourTTS Err bitreich.org 70 i 185 .FS Err bitreich.org 70 i 186 https://doi.org/10.48550/arXiv.2112.02418 Err bitreich.org 70 i 187 .FE Err bitreich.org 70 i 188 is one such example. Err bitreich.org 70 i 189 This could allow us to synthetically re-create a person's voice more easily. Err bitreich.org 70 i 190 . Err bitreich.org 70 i 191 . Err bitreich.org 70 i 192 .IP "My horrible plan" Err bitreich.org 70 i 193 . Err bitreich.org 70 i 194 .PP Err bitreich.org 70 i 195 My horrible plan revolves around going through two different lawsuits to set some judicial precedents; these precedents could improve the odds of succeeding in a lawsuit against Microsoft for Copilot's licensing violations. Err bitreich.org 70 i 196 . Err bitreich.org 70 i 197 .PP Err bitreich.org 70 i 198 If this succeeds, we have new legal justification that GitHub Copilot is illegal; if it fails, we have still gained a means to legally re-create proprietary software. Err bitreich.org 70 i 199 It's a win-win situation. Err bitreich.org 70 i 200 . Err bitreich.org 70 i 201 . Err bitreich.org 70 i 202 .IP "Part One: set a precedent" Err bitreich.org 70 i 203 . Err bitreich.org 70 i 204 .IP 1. Err bitreich.org 70 i 205 Train a modern text-to-speech (TTS) engine using the voice a proprietary one made by a company with a small legal budget. Err bitreich.org 70 i 206 Keep the model's internals hidden. Err bitreich.org 70 i 207 . Err bitreich.org 70 i 208 .IP 2. Err bitreich.org 70 i 209 Then release the final TTS under a permissive license. Err bitreich.org 70 i 210 Remember, we're still keeping the machine-learning model hidden! Err bitreich.org 70 i 211 . Err bitreich.org 70 i 212 .IP 3. Err bitreich.org 70 i 213 Wait for that company to file suit. Err bitreich.org 70 i 214 .FS Err bitreich.org 70 i 215 If the stars align, you could file an anticipatory suit against the company. Err bitreich.org 70 i 216 It's common for declaratory judgement regarding intellectual property rights. Err bitreich.org 70 i 217 https://en.wikipedia.org/wiki/Declaratory_judgment Err bitreich.org 70 i 218 .FE Err bitreich.org 70 i 219 . Err bitreich.org 70 i 220 .IP 4. Err bitreich.org 70 i 221 Win or lose the case. Err bitreich.org 70 i 222 . Err bitreich.org 70 i 223 . Err bitreich.org 70 i 224 .IP "Part Two: use that precedent against Microsoft's Nuance" Err bitreich.org 70 i 225 . Err bitreich.org 70 i 226 .PP Err bitreich.org 70 i 227 Our goal here is to get the same legal outcome as the low-stakes “trial run” of Part One. Err bitreich.org 70 i 228 . Err bitreich.org 70 i 229 .PP Err bitreich.org 70 i 230 Microsoft owns Nuance. Err bitreich.org 70 i 231 Nuance previously bought Eloquent Technology, the developers of the Eloquence speech synthesizer. Err bitreich.org 70 i 232 . Err bitreich.org 70 i 233 .IP 1. Err bitreich.org 70 i 234 Repeat Part One against Nuance speech synthesizers, including Eloquence. Err bitreich.org 70 i 235 Go to court. Err bitreich.org 70 i 236 . Err bitreich.org 70 i 237 .IP 2. Err bitreich.org 70 i 238 Have the ruling from Part One cited as legal precedent. Err bitreich.org 70 i 239 . Err bitreich.org 70 i 240 .IP 3. Err bitreich.org 70 i 241 Achieve the same outcome as Part One, demonstrating that we have indeed set precedent that works against Microsoft's legal department. Err bitreich.org 70 i 242 . Err bitreich.org 70 i 243 . Err bitreich.org 70 i 244 .IP "Implications of the outcomes" Err bitreich.org 70 i 245 . Err bitreich.org 70 i 246 .PP Err bitreich.org 70 i 247 If we \fIwin\fR both cases: Err bitreich.org 70 i 248 Microsoft has the legal high ground. Err bitreich.org 70 i 249 Making a derivative of a copyrighted work using a machine-learning algorithm allows us to bypass copyright licenses. Err bitreich.org 70 i 250 . Err bitreich.org 70 i 251 .PP Err bitreich.org 70 i 252 If we \fIlose\fR both cases: Err bitreich.org 70 i 253 Microsoft does not have the legal high ground. Err bitreich.org 70 i 254 We have good judicial precedent against Microsoft to use when filing suit for Copilot's behavior. Err bitreich.org 70 i 255 . Err bitreich.org 70 i 256 .PP Err bitreich.org 70 i 257 Either way, it's an absolute win for free software. Err bitreich.org 70 i 258 Taking down Copilot protects copyleft from enabling proprietary derivatives (and by extension, protects software freedom). Err bitreich.org 70 i 259 But if we accidentally win these two low-stakes “test” cases, we still gain something else: Err bitreich.org 70 i 260 we can liberate huge swaths of proprietary software, starting with speech synthesizers. Err bitreich.org 70 i 261 . Err bitreich.org 70 i 262 . Err bitreich.org 70 i 263 .IP "Update: on satire" Err bitreich.org 70 i 264 . Err bitreich.org 70 i 265 .PP Err bitreich.org 70 i 266 This post isn't “satire through-and-through” like something from The Onion. Err bitreich.org 70 i 267 Rather, my intent was to make some clear points, but extrapolate them to absurdity to highlight other problems. Err bitreich.org 70 i 268 I don't think I was clear enough when doing this. Err bitreich.org 70 i 269 I'm sorry. Err bitreich.org 70 i 270 . Err bitreich.org 70 i 271 .PP Err bitreich.org 70 i 272 Copilot has been found to suggest significant amounts of code that is dangerously similar to existing works. Err bitreich.org 70 i 273 It does this without disclosing obligations that come with those works' licenses. Err bitreich.org 70 i 274 Training a model on copyrighted works may not be wrong in and of itself; however, using that model to generate new works that are not sufficiently distinct from original works is where things get problematic. Err bitreich.org 70 i 275 Copilot's users could apply proprietary licenses to the generated works, defeating the point of copyleft. Err bitreich.org 70 i 276 . Err bitreich.org 70 i 277 .PP Err bitreich.org 70 i 278 When a tool almost exclusively encourages problematic behavior, the makers of that tool should have put thought into its implications. Err bitreich.org 70 i 279 GitHub and OpenAI have not demonstrated a sufficiently careful approach. Err bitreich.org 70 i 280 . Err bitreich.org 70 i 281 .PP Err bitreich.org 70 i 282 I don't think that “going after” a smaller player just to manipulate our legal system is a good thing to do. Err bitreich.org 70 i 283 The fact that this idea seems plausible to some of my readers shows how warped our perception of the judicial system is. Err bitreich.org 70 i 284 Even if it's accurate (I doubt it's accurate, but I'm not certain), it's sad. Err bitreich.org 70 i 285 Judicial systems incentivise too much predatory behavior. Err bitreich.org 70 i 286 . Err bitreich.org 70 i 287 . Err bitreich.org 70 i 288 .IP "Corrections" Err bitreich.org 70 i 289 . Err bitreich.org 70 i 290 It's come to my attention that Eloquence may or may not still belong to Nuance. Err bitreich.org 70 i 291 Further research is needed. Err bitreich.org 70 i 292 Eloquent Technology was acquired by SpeechWorks in 2000. Err bitreich.org 70 .