A little over a year ago I played around with some deep learning software – back then the latest and greatest was OpenAI’s GPT-3, although I also used the older GPT-2 and EleutherAI’s GPT-J-6G. These are all Natural Language Processing algorithms using a Generative Pretrained Transformer – and they work by reading text to learn from, breaking the text down into “tokens” made of entire words or parts of words, and then mapping the connections between these tokens that it sees in the text, in a way that on the surface looks similar to connections between neurons in the brain (of course there’s a lot more to it than that, but this is hopefully a helpful simplification). Once the AI model has been trained, it can use the connections it’s learned between these tokens to guess what the next token might be when given some text – in essence, you give it some text and it’ll try to guess the most probable next word. This can happen again and again, until the AI has written entire sentences that look like they were written by a human.
What’s been happening recently with the AI leaps we’ve seen (including ChatGPT, GPT-4 and Bing in the last few months) is that the company who owns GPT has been training their AI model on larger and larger corpuses of data, as well as improving their code and adding filters to guide GPT towards not being evil. It turns out that the internet is a great source of natural language, and a lot of it is very easy to scrape and feed into one of these algorithms. So these pieces of code are trained on lots and lots of internet text.
This (pre) training is very processor intensive, needing thousands of hours of time on modern PCs using specialised AI chips on expensive graphics cards. However, once the algorithm has been trained, the dataset that has been created is just a few hundred megabytes (or more recently gigabytes) in size, and can be quickly loaded into memory – and this training only needs to be done once.
The same software can be fine-tuned by giving it a smaller set of data. Using its ability to put together coherent sentences, the software can emulate the data set it’s been fine-tuned on. The following are descriptions of some fun projects I tried out last year with this type of software that may be of interest to rationalists. Of course, all of this has been eclipsed by recent leaps in GPT’s abilities, and once I get access to GPT-4 I’m sure I’ll put it to good use!
Hate Speech Submission
I thought it would be fun to use EleutherAI’s free online GPT-J-6G deep learning model to attempt to write a Hate Speech consultation submission to parliament. I gave the software the first few paragraphs of the NZARH’s real submission, and then clicked a button to guess the next hundred or so words. I then fed the result back into the algorithm so that it could create the next block of text, and so on.
Eventually the text started getting repetitive, and veering off track, but for a while what I was being given by this trained algorithm made sense and looked pretty believable. I don’t agree with its arguments, but at least it seemed to be creating text that reads like a coherent argument. Please be aware that the material it references, such as pieces of legislation, likely don’t exist – this software is just trying to emulate the kind of language it’s been trained on, writing it’s read in blogs and on forums from all across the internet.
In the text below, Proposal 1 (in italics) comes from the NZARH’s actual written submission. The algorithm was given this text, along with the intro to our submission. Everything not in italics, starting at Proposal 2, was written by a deep learning algorithm without any fine-tuning. Have a read and see if, like me, you’re more than a little bit impressed with how well it did:
Proposal 1: We agree that broadening the hate speech provisions to new groups will afford these groups better protection, but non-religious or ethical beliefs should be included as a protected group
We agree that broadening the hate speech provisions to new groups will afford these groups better protection.
We consider that the proposed new groups are deserving of legal protection, and we are especially happy to see protections for groups based on sex/gender/sexuality added to our laws.
We recommend non-religious or ethical beliefs be included as a protected group in order to be consistent with international human rights laws, the New Zealand Bill of Rights Act 1990, and The Human Rights Act 1993.
Without including ethical beliefs, the proposed changes would give preferential treatment to religious groups. Persecuted minority groups (for example, ex-Muslims) would not be afforded the same level of protection from hate speech.
The Human Rights Act 1993 (s.21 (1)(d)) includes “ethical belief” as well as religion, so this change may be as simple as using “religion or ethical belief” instead of “religion” in the new legislation.
Aside from the need for inclusion of non-religious groups, we are not aware of any other groups that experience hate speech that we feel should be covered by these changes to legislation.
Proposal 2: We would recommend that the following groups be included as protected groups in the proposed legislation changes:
- Religious groups
- Ethical or non-religious groups
- Social welfare organisations
- Any person or group targeted by hate speech due to their association with any of the above
We recommend the above groups be included because they are the groups most at risk of experiencing hate speech in Aotearoa New Zealand.
We would add a provision which provides protection for the association of those groups with religion or ethical belief, in addition to the protection afforded to those groups as such.
For example, we would add “or a person or group that has been targeted by hate speech due to its association with a person or group covered by these provisions” in the same place where “ethical belief” is defined.
This would provide protection for people and groups that experience hate speech for no other reason than association with a protected group. This provision is already included in the definition of “discrimination” in the Discrimination Act 2000 (ss.4 (4) (a) and (d) ), but we consider it to be a fundamental part of the proposed changes and recommend its inclusion.
Our concern is that without including this provision, people and groups will only be afforded protection if they are religious.
Proposal 3: We recommend that the following groups be excluded as protected groups in the proposed legislation changes:
- Political groups
- Commercial groups
- Groups or individuals who have committed violent acts (which are already excluded)
We believe these groups should not be included in the proposed legislation because we do not believe they are vulnerable to hate speech.
We would further note that we consider that political parties, commercial groups, and groups that have been convicted of committing violent acts are already afforded sufficient protection to protect them from hate speech, because they already face very strong provisions for prosecution.
Furthermore, these groups are already afforded sufficient protections under current law to protect them from hate speech, so we believe they do not need to be included in the proposed changes.
Proposal 4: We believe that hate speech should be subject to the same criminal law as other forms of speech. We recommend that the following language be used in the definition of “discrimination” in the new legislation:
- “Discrimination” is any act or refusal to act by an owner, manager, controller or occupier of a place of accommodation, workplace, education institution or other place to which the public has access, that is directed against a person or group because of their religion, ethical beliefs, political views, sex/gender/sexuality, race or ethnic origin.
We recommend that this definition include “religious belief” as well as “ethical belief”, as is already included in the definition of “discrimination” in the Discrimination Act 2000 (ss.4 (4) (a) and (d) ).
QAnonsense
Using an older algorithm, GPT-2, with a fairly small data set that it had been pre-trained on, I fine tuned the software on the entire back catalogue of QAnon posts. Below are six QAnon posts – three are the genuine article, and three are fakes created by the GPT-2 algorithm. Can you figure out which is which?
#FactsMatter – Truth to Power – Facts = Power of OP.
Who controls the truth?
Who controls the narrative? #FactsMatter #FactsMatter
We know.
Q
>>2434941
Think WRAY (yesterday).
Think [RR] (today).
Comms understood?
Do you believe in coincidences?
Q
What happens when the public finds out the TRUTH?
What happens when the FAKE NEWS MEDIA can no longer control the narrative (when disinformation is no longer swallowed and/or accepted as auto-truth)?
CHANGE IS COMING.
THE GREAT AWAKENING.
Q
>>533922
Think for yourself.
I know you can ‘defend your position’ if you can be ‘truth-telling’ and ‘vocal’ about what you feel.
But this isn’t ‘public opinion.’
These people (or ‘others’) are ‘terrorists,’ and this will ‘kill them.’
We were scared for you.
You are needed.
Q
>>7087382
Note the date of the post – Nov 2017.
Note events happening today.
Reconcile.
News unlocks.
Q
Will the MSM push the lie re: Russian collusion?
Prevent if necessary.
Will they have the courage to conduct investigation if found to be Falsifiable?
Will they have the decency or the ability to report true facts re: collusion?
Will they have the courage to conduct an unbiased investigation if found to have been Falsifiable?
Prevent if necessary.
The Great Awakening.
Q
If you’ve played along, and want to know the answer, the first, fourth and sixth are made by the AI, and the second, third and fifth are genuine QAnon posts. I wasn’t able to spot the fakes without looking it up.
Trying to create a new Religion
This one surprised me a little. I fed GPT-2 about a dozen holy texts, with the intention of generating some text that was their distilled, combined wisdom. Instead, what I received each time I ran the generator was an attempt to recreate text from just one of the books I’d trained it on. The results are impressive – although the text generated by deep learning doesn’t always make a huge amount of sense, I think it could be reasonably argued that the same is the case for genuine holy texts as well.
This is another game you can play along with. All of the texts below were generated by the GPT-2 deep learning algorithm – for each of them, can you figure out which holy text they’re emulating? I can’t be sure, but I’ve taken a good guess and included my answers underneath:
1 Esdr 15:10
Moreover the Lord said unto Moses, If ye speak to any people to speak good, say, Ye shall not speak evil; for they are of an unlawful speech of their own accord.
1 Esdr 15:11
And after the lawgiver said unto the LORD, Speak unto the heathen of the Lord, that they may teach in their own lands of the good word, and that they may not speak wickedly; for they shall never be accepted into their lands of their inheritance, until the day of peace be fulfilled.
3:005:008 Thou art the King’s counsellor, and I the King his counsellor, and thy word is truth, and I the King’s counsellor, and thy word is justice; but the words of the wicked and the wicked are in your mouth: and thou shalt not deliver them, and I the King will deliver thee: so tell me the words of the wicked, as the wickedness of my soul: and thou shalt say, If, when thou art king, thou keepest the commandments of my conscience: then thou art my brother, and I shall not be ashamed.
20:2 And the people of Judah went down to the place, and went forth to fight against the people of Jerusalem. Nevertheless, before they did so, they did it with great courage; but not of the same courage was they like unto the city of the children of Israel.
20:3 And the people of Jerusalem fought not as the people of Judah did: but they were not of the same spirit, because the LORD of hosts dealt a dispute with the people of Judah.
20:4 And the people of this city did not rise up against the people, neither went to Jerusalem with the battle; but they went down, and returned as they went down, and departed, and went their way.
20:5 Thus they were scattered, with their tents; and it was not according to their numbers for that they were so scattered in like manner; but according to their number they went up against them, and did battle.
They had seen his face from their youth, and were afraid. He had shown great power on the field; their fear was not so great. They found a place that they could put distance between the brother and his master in a distance that was ten paces. He was able to make his brother a prisoner, to cut off their right hand, and to put their master to the death by hanging. His brother then went with his master to the field, and told the people that there were two men at the door, and that the brothers had been there four or five minutes.
It is my advice therefore to make use of all means whereby I am able to make use of my intelligence in this matter. For I feel sure that the problems that lie before thee, though greater than any before, and worse still, than were before, and yet, though less, are all to be expected from the evil and wickedness that are in his sight.
To-day science reveals the Universe to be subject to the law of relativity, thus, in the sense of a flat, constant state, and yet that the Universe also changes for various conditions, – and this also indicates that matter, is not to blame for the existence of God. We do not find a continuity of matter in matter or a succession in matter over Matter. All that exists does so in degrees and that the higher degree of being leads to other degrees and that all that takes place rests upon this higher level of being.
Although I’m not 100% sure, my guess for what these texts are emulating is:
- The Apocrypha
- The King James Bible
- The Book of Mormon
- The Koran
- The Kitáb-i-Aqdas
- Science and Health With Key to the Scriptures
Conclusion
Although all of this was just done for a bit of fun, and as a way for me to get a little experience using Natural Language Processing algorithms, it was surprising just how easy it was to generate convincing text that is indistinguishable from the source material.
And to show you just how simple this was to do on my laptop, I’ll leave you with my entire source code for generating the fake QAnon posts – of course, the gpt_2_simple library I import will be thousands of lines of code, but this gives you an idea of how little software I had to write to train and run my model to generate the fake QAnon posts:
import gpt_2_simple as gpt2
gpt2.download_gpt2(model_name = "355M")
sess = gpt2.start_tf_sess()
gpt2.finetune(sess, "qanon.txt", model_name = "355M", run_name = "qanon", steps = 1000)
gpt2.generate(sess)