2024-01-08

A model of research skill

~4k words (20 minutes)

Doing research means answering questions no one yet knows the answer to. Lots of impactful projects are downstream of being good at this. A good first step is to have a model for what the hard parts of research skill are.

Two failure modes

There are two opposing failure modes you can fall into when thinking about research skill.

The first is the deferential one. Research skill is this amorphous complicated things, so the only way to be sure you have it is to spend years developing it within some ossified ancient bureaucracy and then have someone in a funny hat hand you a piece of paper (bonus points for Latin being involved).

The second is the hubristic one. You want to do, say, AI alignment research. This involves thinking hard, maybe writing some code, maybe doing some maths, and then writing up your results. You’re good at thinking - after all, you read the Sequences, like, 1.5 times. You can code. You did a STEM undergrad. And writing? Pffft, you’ve been doing that since kindergarten!

I think there’s a lot to be said for hubris. Skills can often be learned well by colliding hard with reality in unstructured ways. Good coders are famously often self-taught. The venture capitalists who thought that management experience and a solid business background are needed to build a billion-dollar company are now mostly extinct.

It’s less clear that research works like this, though. I’ve often heard it said that it’s rare for a researcher to do great work without having been mentored by someone who was themselves a great researcher. Exceptions exist and I’m sceptical that any good statistics exist on this point. However, this is the sort of hearsay an aspiring researcher should pay attention to. It also seems like the feedback signal in research is worse than in programming or startups, which makes it harder to learn.

Methodology, except “methodology” is too fancy a word

To answer this question, and steer between deferential confusion and hubristic over-simplicity, I interviewed people who had done good research to try to understand their models of research skill. I also read a lot of blog posts. Specifically, I wanted to understand what about research a bright, agentic, technical person trying to learn at high speed would likely fail at and either not realise or not be able to fix quickly.

I did structured interviews with Neel Nanda (Google DeepMind; grokking), Lauro Langosco (Krueger Lab; goal misgeneralisation), and one other. I also learned a lot from unstructured conversations with Ferenc Huszar, Dmitrii Krasheninnikov, Sören Mindermann, Owain Evans, and several others. I then ~~~procrastinated on this project for 6 months~~~ touched grass and formed inside views by doing the MATS research program under the mentorship of Owain Evans. I owe a lot to the people I spoke to and their willingness to give their time and takes, but my interpretation and model should not taken as one they would necessarily endorse.

My own first-hand research experience consists mainly of a research-oriented CS (i.e. ML) master’s degree, followed by working as a full-time researcher for 6 months and counting. There are many who have better inside views than I do on this topic.

The Big Three

In summary:

There are a lot of ways reality could be (i.e. hypotheses), and a lot of possible experiment designs. You want to avoid brute-forcing your way through these large spaces as much as possible, and instead be good at picking likely-true hypotheses or informative experiments. Being good at this is called research taste, and it’s largely an intuitive thing that develops over a lot of time spent engaging with a field.
Once you have some bits of evidence from your experiment, it’s easy to over-interpret them (perhaps you interpret them as more bits than they actually are, or perhaps you were failing to consider how large hypothesis space is to start with). To counteract this, you need sufficient paranoia about your results, which mainly just takes careful and creative thought, and good epistemics.
Finally, you need to communicate your results to transfer those bits of evidence into other people’s heads, because we live in a society.

Taste

Empirically, it seems that a lot of the value of senior researchers is a better sense of which questions are important to tackle, and better judgement for what angles of attack will work. For example, good PhD students often say that even if they’re generally as technically competent as their adviser and read a lot of papers, their adviser has much better quick judgements about whether something is a promising direction.

When I was working on my master’s thesis, I had several moments where I was working through some maths and get stuck. I’d go to one of my supervisors, a PhD student, and they’d have some ideas on angles of attack that I hadn’t thought of. We’d work on it for an hour and make more progress than I had in several hours on my own. Then I’d go to another one of my supervisors, a professor, and in fifteen minutes they’d have tried something that worked. Part of this is experience making you faster at crunching through derivations, and knowing things like helpful identities or methods. But the biggest difference seemed to be a good gut feeling for what the most promising angle or next step is.

I think the fundamental driver of this effect is dealing with large spaces: there are many possible ways reality could be (John Wentworth talks about this here), and many possible things you could try, and even being slightly better at honing in on the right things helps a lot. Let’s say you’re trying to prove a theorem that takes 4 steps to prove. If you have a 80% chance of picking the right move at each step, you’ll have a 41% chance of success per attempt. If that chance is 60%, you’ll have a 13% chance – over 3 times less. If you’re trying to find the right hypothesis within some hypothesis space, and you’ve already managed to cut down the entropy of your probability distribution over hypotheses to 10 bits, you’ll be able to narrow down to the correct hypothesis faster and with fewer bits than someone whose entropy is 15 bits (and who’s search space is therefore effectively 2⁵ = 32 times as large). Of course, you’re rarely chasing down just a single hypothesis in a defined hypothesis class. But if you’re constantly 5 extra bits of evidence ahead compared to someone in what you’ve incorporated into your beliefs, you’ll make weirdly accurate guesses from their perspective.

Why does research taste seem to correlate so strongly with experience? I think it’s because the bottleneck is seeing and integrating evidence into your (both explicit and intuitive) world models. No one is close to having integrated all empirical evidence that exists, and new evidence keeps accumulating, so returns from reading and seeing more keep going. (In addition to literal experiments, I count things like “doing a thousand maths problems in this area of maths” as “empirical” evidence for your intuitions about which approaches work; I assume this gets distilled into half-conscious intuitions that your brain can then use when faced with similar problems in the future)

This suggests that the way to speed-run getting research taste is to see lots of evidence about research ideas failing or succeeding. To do this, you could:

Have your own research ideas, and run experiments to test them. The feedback quality is theoretically ideal, since reality does not lie (but may be constrained by what experiments you can realistically run, and a lack of the paranoia that I talk about next). The main disadvantage is that this is often slow and/or expensive.
Read papers to see whether other people’s research ideas succeeded or failed. This is prone to several problems:
1. Biases: in theory, published papers are drawn from the set of ideas that ended up working, so you might not see negative samples (which is bad for learning). In practice, paper creation and selection processes are imperfect, so you might see lots of bad or poorly-communicated ones.
2. Passivity: it’s easy to fool yourself into thinking you would’ve guessed the paper ideas beforehand. Active reading strategies could help; for example, read only the paper’s motivation section and write down what experiment you’d design to test it, and then read only the methodology section and write down a guess about the results.
Ask someone more experienced than you to rate your ideas. A mentor’s feedback is not as good as reality’s, but you can get it a lot faster (at least in theory). The speed up is huge: a big ML experiment might take a month to set up and run, but you can probably get detailed feedback on 10 ideas in an hour of conversation. This is a ~7000x speedup. I suspect a lot of the value of research mentoring lies here: an enormous amount of predictable failures or inefficiently targeted ideas can be skipped or honed into better ones, before you spend time running the expensive test of actually checking with reality. (If true, this would imply that the value of research mentorship is higher whenever feedback loops are worse.)

Chris Olah has a list of suggestions for research taste exercises (number 1 is essentially the last point on my list above).

Research taste takes the most time to develop, and seems to explain the largest part of the performance gap between junior and senior researchers. It is therefore the single most important thing to focus on developing.

(If taste is so important, why does research output not increase monotonically with age in STEM fields? The scary biological explanation is that fluid intelligence (or energy or …) starts dropping at some age, and this decreases your ability to execute on maths/code, even assuming your research taste is constant or improving. Alternatively, hours used on deep technical work might tend to decline with advanced career stages.)

Paranoia

I heard several people saying that junior researchers will sometimes jump to conclusions, or interpret their evidence as saying more than it actually does. My instinctive reaction to this is: “wait, but surely if you just creatively brainstorm the ways the evidence might be misleading, and take these into account in making your conclusions (or are industrious about running additional experiments to check them), you can just avoid this failure mode?” The average answer I got was that yes, this seems true, and indeed many people either only need one peer review cycle to internalise this mindset, or pretty much get it from the start. Therefore, I’m almost tempted to chuck this category off this list, and onto the list of less crucial things where “be generally competent and strategic” will sort you out in a reasonable amount of time. However, two things hold me back.

First, confirmation bias is a strong thing, and it seems helpful to wave a big red sign saying “WARNING: you may be about to experience confirmation bias”.

Second, I think this is one of the cases where the level of paranoia required is sometimes more than you expect, even after you expect it will be high. John Wentworth puts this best in You Are Not Measuring What You Think You Are Measuring, which you should go read right now. There are more confounders and weird effects than are dreamt of in your philosophies.

A few people mentioned going through the peer review process as being a particularly helpful thing for developing paranoia.

Communication

I started out sceptical about the difficulty of research-specific communication, above and beyond general good writing. However, I was eventually persuaded that yes, research-specific communication skills exist and are important.

First, if research has impact, it is through communication. Rob Miles once said (at a talk) something along the lines of: “if you’re trying to ensure positive AGI outcomes through technical work, and you think that you are not going to be one of the people who literally writes the code for it or is in the room when it’s turned on, your path to impact lies through telling other people about your technical ideas.” (This generalises: if you want to drive good policy through your research and you’re not literally writing it …, etc.) So you should expect good communication to be a force multiplier applied on top of everything else, and therefore very important.

Secondly, research is often not communicated well. On the smaller scale, Steven Pinker moans endlessly – and with good reason – about academic prose (my particular pet peeve is the endemic utilisation of the word “utilise” in ML papers.). On the larger scale, entire research agendas can get ignored because the key ideas aren’t communicated in a sufficiently clear and legible way.

I don’t know what’s the best way to speed-run getting good at research communication. Maybe read Pinker to make sure you’re not making predictable mistakes in general writing. I’ve heard that experienced researchers are often good at writing papers, so maybe seek feedback from any you know (but don’t internalise the things they say that are about goodharting for paper acceptance). With papers, understand how papers are read. Some sources of research-specific communication difficulty I can see are (a) the unusually high need for precision (especially in papers), and (b) communicating the intuitive, high-context, and often unverbalised-by-default world models that guide your research taste (especially when talking about research agendas).

Other points

Having a research problem is not enough. You need an angle of attack.
- Richard Feynman once said something like: keep a set of open problems in your head. Whenever you discover a new tool (e.g. a new method), run through this list of problems and see if you can apply it. I think this can also be extended to new facts; whenever you hear about a discovery, run through a list of open questions and see how you should update.
- Hamming says something similar in You and your research: “Most great scientists know many important problems. They have something between 10 and 20 important problems for which they are looking for an attack.”
Research requires a large combination of things to go right. Often, someone will be good at a few of them but not all of them.
- A sample list might be:
  - generating good ideas
  - picking good ideas (= research taste)
  - iterate rapidly to get empirical feedback
  - interpreting your results right (paranoia)
  - communicating your findings
- If success is a product of either sufficiently many variables or of normally distributed variables, the distribution of success should be log-normal, and therefore fairly heavy-tailed. And yes, research is heavy-tailed. Dan Hendrycks and Thomas Woodside claim that while there may be 10x engineers, there are 1000x researchers. This seems true.
  - However, this also means that not being the best at one of the component skills does not doom your ability to still have a really good product across categories.
Ideas from other fields are often worth stealing. There exist standardised pipelines to produce people who are experts in X for many different X, but far less so to produce people who are experts in both X and some other Y. Expect many people in X to miss out on ideas in Y (though remember that not all Y are relevant).
Research involves infrequent and uncertain feedback. Motivation is important and can be hard. Grad students are notorious for having bad mental health. A big chunk of this is due to the insanities of academia rather than research itself. However, startups are somewhat analogous to research (high-risk, difficult, often ambiguous structure), lack institutionalised insanity, and are also acknowledged to be mentally tough.
- The most powerful and universally-applicable hack to make something not suck for a human is for that human to do it together with other humans. Also, more humans = more brains.
Getting new research ideas is often not a particularly big-brained process. Once I had the impression that most research ideas would come from explicitly thinking hard about research ideas, and generating fancy ideas would be a major bottleneck. However, I’ve found that many ideas come with surprisingly little effort, with a feeling of “well, if I want X, the type of thing I should do is probably Y”. Whiteboarding with other people is also great.
- This is not to say that idea generation isn’t helped by actively brainstorming hard. Just that it’s not the only, or even majority, source of ideas.
- The feeling of ideas being rare is often a newbie phase. You should (and very likely will) pass over it quickly if you’re engaging with a field. John Wentworth has a good post on the topic. I have personally experienced an increase in concrete research ideas, and much greater willingness to discard ideas, after going through a few I’ve felt excited by.
- When you look at a field from afar, you see a smooth shape of big topics and abstractions. This makes it easy to feel that everything is done. Once you’re actually at the frontier, you invariably discover that it’s full of holes, with many simple questions that don’t have answers.
There’s great benefit to an idea being the top thing in your mind.
When in doubt, log more. Easily being able to run more analyses is good. At some point you will think to yourself something like “huh, I wonder if thing X13 had an effect, I’ll run the statistics”, and then either thank yourself because you logged the value of X13 in your experiments, or facepalm because you didn’t.
Tolerate the appearance of stupidity (in yourself and others). Research is an intellectual domain, and humans are status-obsessed monkeys. Humans doing research therefore often feel like they need to appear smart. This can lead to a type of wishful thinking where you hear some idea and try to delude yourself (and others) into thinking you understand it immediately, without actually knowing how it bottoms out into concrete things. Remember that any valid idea or chain of reasoning decomposes into simple pieces. Allow yourself to think about the simple things, and ask questions about them.
- There is an anecdote about Niels Bohr (related by George Gamow and quoted here): “Many a time, a visiting young physicist (most physicists visiting Copenhagen were young) would deliver a brilliant talk about his recent calculations on some intricate problem of the quantum theory. Everybody in the audience would understand the argument quite clearly, but Bohr wouldn’t. So everybody would start to explain to Bohr the simple point he had missed, and in the resulting turmoil everybody would stop understanding anything. Finally, after a considerable period of time, Bohr would begin to understand, and it would turn out that what he understood about the problem presented by the visitor was quite different from what the visitor meant, and was correct, while the visitor’s interpretation was wrong.”
“Real ~~artists~~ researchers ship”. Like in anything else, iteration speed really matters.
- Sometimes high iteration speed means schlepping. You should not hesitate to schlep. The deep learning revolution started when some people wrote a lot of low-level CUDA code to get a neural network to run on a GPU. I once reflected on why my experiments were going slower than I hoped, and realised a mental ick for hacky code was making me go about things in a complex roundabout way. I spent a few hours writing ugly code in Jupyter notebooks, got results, and moved on. Researchers are notorious for writing bad code, but there are reasons (apart from laziness and lack of experience) why the style of researcher code is sometimes different from standards of good software.
- The most important thing is doing informative things that make you collide with reality at a high rate, but being even slightly strategic will give great improvements on even that. Jacob Steinhardt gives good advice about this in Research as a Stochastic Decision Process. In particular, start with the thing that is most informative per unit time (rather than e.g. the easiest to do).

Good things to read on research skill

(I have already linked to some of these above.)

General advice on research from experienced researchers
- You and Your Research (Richard Hamming – old but still unbeaten. Hamming also has a book that includes this lecture among other material, but the lecture is the best bit of it and a good 80/20.)
- Career advice (Terry Tao)
- Research as a Stochastic Decision Process (Jacob Steinhardt)
- My research methodology (Paul Christiano)
- An Opinionated Guide to ML Research (John Schulman)
- PhD: a retrospective analysis (Eugene Vinitsky)
John Wentworth’s posts about specific research meta-topics
Relevant Paul Graham essays
- The Top Idea in Your Mind
- How to do Great Work
Advice aimed at new alignment researchers
A Bird’s Eye View of the ML Field (a good overview of how the ML field works)
The importance of stupidity in scientific research (short and sweet)
Research Taste Exercises (what is says on the tin)

2023-06-04

A Disneyland Without Children

The spaceship swung into orbit around the blue-grey planet with a final burn of its engines. Compared to the distance they had travelled, the world, now only some four hundred kilometres below and filling up one hemisphere of the sky, was practically within reach. But Alice was no less confused.

“Well?” she asked.

Charlie stared thoughtfully at the world slowly rotating underneath their feet, oceans glinting in the sunlight. “It looks lickable”, he said.

“We have a task”, Alice said, trying to sound gentle. Spaceflight was hard. Organic life was not designed for it. But their mission was critical, they needed to move fast, and Charlie, for all his quirks, would need to be focused.

“What’s a few minutes when it will take years for anything we discover to be known back home?” Charlie asked.

“No licking”, Alice said.

Charlie rolled his eyes, then refocused them on the surface of the planet below. They were just crossing the coast of one of the larger continents. Blue water was giving way to grey land.

“Look at the texture”, Charlie said. They had seen it from far away with telescopes, but there was something different about seeing it with their bare eyes. Most of the land surface of the planet was like a rug of fine grey mesh. If there had been lights, Alice would have guessed the entire planet’s land was one sprawling city, but as far as their instruments could tell, the world had no artificial lighting.

As far as they could tell, the world also had no radio. They had broadcast messages at every frequency they could, and in desperation even by using their engines to flash a message during their deceleration burn. No response had come.

Alice pulled up one of the telescope feeds on the computer to look closer at the surface. She saw grey rectangular slabs, typically several hundred metres on a side, with wide roads running between them. The pattern was not perfect - sometimes it was irregular, and sometimes there were smaller features too. Some of the smaller ones moved.

“Are they factories?” Charlie asked.

“I’d guess so”, Alice said, watching on the telescope feed as a steady stream of rectangular moving objects, each about ten metres long, slid along a street. Another such stream was moving along an intersecting street, and it looked like they would crash at the intersection, but the timing and spacing was such that vehicles from one stream crossed the road just as there were gaps in vehicles along the other stream.

“A planet covered by factories, then”, Charlie said. “With no one home to turn the lights on.”

“I want to see what they’re making”, Alice said.

-

All through the atmospheric entry of their first drone package, Alice sat tight in her seat and clenched and unclenched her hands. So far all they had done was passive observation or broadcasting. A chunky piece of hardware tracing a streak of red-hot plasma behind it was a much louder knock. She imagined alien jet fighters scrambling to destroy their drones, and some space defence mechanism activating to burn their ship.

The image she saw was a jittery camera feed, showing the black back of the heatshield, the grey skin of the drone package, and a sliver of blue sky. It shook violently as the two halves of the heatshield detached from each other and then the drone package, tumbling off in opposite directions. Land became visible, kilometres below, the grey blocks of the buildings tiny like children’s blocks but still visibly three-dimensional, casting shadows and moving as the drone package continued falling.

The three drones tested their engines, and for a moment flew - or at least slowed their descent - in an ungainly joint configuration, before breaking off from each other and spreading their wings to the fullest. The feed showed the other two drones veering off into the distance on wide narrow wings, and then the view pulled up as the nose of the drone lifted from near-vertical to horizontal.

“Oops, looks like we have company”, Charlie said. He had been tapping away at some other screens while Alice watched the drone deployment sequence.

Alice jumped up from her seat. “What?”

“Our company is … a self-referential joke!”

Alice resisted the temptation to say anything and instead sunk back into her seat. On her monitor, the grey blocks continued slowly moving below the drone. She tapped her foot against the ground.

“Actually though”, Charlie said. “We’re not the only ones in orbit around this planet.”

“What else is orbiting? Has your sense of shame finally caught up with you and joined us?”

“Looks like satellites. Far above us, though. Can you guess how far?”

“I’d guess approximately the distance between you and maturity, so … five light-years?”

Charlie ignored her. “Exactly geostationary altitude”, he said, grinning. The grin was like some platonic ideal of intellectual excitement; too pure for Alice’s annoyance to stay with her, or for her to feel scared about the implications.

“But nothing in lower orbits?” Alice asked.

“No”, Charlie said. “Someone clearly put them there; stuff doesn’t end up at exactly geostationary altitude unless someone deliberately flies a communications or GPS satellite there. Now I can’t be entirely sure that the geostationary satellites are completely dead, but I’d guess that they are.”

“Like everything else”, Alice said, but even as she said so she caught sight of a long trail of vehicles making its way along one of the roads. There was something more real about seeing them on the drone feed.

“Maybe this is just a mining outpost”, Charlie said. “Big rocket launch to blast out a billion tons of ore to god-knows-where, once a year.”

“Or maybe they’re hiding underground or in the oceans”, Alice said.

“Let’s get one of the drones to drop a probe into the oceans. I’ll send one of our initial trio over to the nearest one, it’s only a few hundred kilometres away”, Charlie said.

“Sure”, Alice said.

They split the work of flying the drones, two of them mapping out more and more of the Great Grey Grid (as Alice took to calling it in her head), and one flying over the planet’s largest ocean.

Even the oceans were mostly a barren grey waste. Not empty, though. They did eventually see a few small scaly fish-like creatures that stared at their environment with uncomprehending eyes. Alien life. A young Alice would have been ecstatic. But now she was on a mission, and her inability to figure out what had happened on this planet annoyed her.

In addition to the ocean probe, they had rovers they could send crawling along the ground. Sometimes the doors of the square buildings were open, and Alice would drive a rover past one opening. Most seemed to either be warehouses of stacked crates, or then there would be some kind of automated assembly line of skeletal grey robot arms and moving conveyor belts. A few seemed to place more barriers between the open air and their contents; what went on there, the rovers did not see.

The first time Alice tried to steer a rover into a building, it got run over by a departing convoy of vehicles. The vehicles were rectangular in shape but with an aerodynamic head, with three wheels on each side. Based on their dimensions, she could easily imagine one weighing ten or twenty tons. The rover had no chance.

“Finally!” Charlie had said. “We get to fight these aliens.”

But there was no fight. It seemed like it had been a pure accident, without any hint of malice. The grey vehicles moved and stopped on some schedule of their own, and for all Alice knew they were not just insensitive beasts but blind and dumb ones too.

The next rover got in, quickly scooting through the side of the entrance and then off to one side, out of path of the grey vehicles. It wandered the building on its own, headlights turned on in the otherwise-dark building to bring back a video stream of an assembly line brooded over by those same skeletal hands they had glimpsed from outside. Black plastic beads came in by the million on the grey vehicles. A small thin arm with a spike on the end punctured a few holes on one side, and using these holes two of the black beads were sown onto an amorphous plushy shape. The shape got appendages, were covered with a layer of fluff, and the entire thing became a cheerful purple when it passed through an opaque box with pipes leading into it. It looked like a child’s impression of a hairy four-legged creature with black beady eyes above a long snout. A toy, but for who?

The conveyor belt took an endless line of those fake creatures past the rover’s camera at the end of the assembly line. Alice watched them go, one by one, and fall onto the open back of a grey vehicle. It felt like each and every one made eye contact with her, beady black eyes glinting in the light. She watched for a long time as the vehicle filled up. Once it did, a panel slid over the open top to close the cargo bay, and it sped off out the door. The conveyor belt kept running, but there was a gap of a few metres to the next plushy toy. It came closer and closer to the end - and suddenly a vehicle was driving into place, and the next creature was falling, and it just barely fell into the storage hold of the vehicle while it was driving into place.

“How scary do you find the Blight?” Alice asked.

“Scary enough that I volunteered for this mission”, Charlie said.

Alice remembered the charts they had been shown. They had been hard to miss; even the news, usually full of celebrity gossip and political machinations, had quickly switched to concentrating on the weirdness in the sky once the astronomers spotted it. Starlight dimming in many star systems and what remained of the the light spectra shifting towards the infrared. Draw a barrier around the affected area, and you get a sphere 30 light-years wide, expanding at a third of the speed of light. At the epicentre, a world that had shown all the signs of intelligent life that could be detected from hundreds of light-years away - a world that astronomers had broadcast signals to in the hopes of finally making contact with another civilisation - that had suddenly gone quiet and experienced a total loss of oxygen in its atmosphere. The Blight, they had called it.

In the following years, civilisation had mobilised. A hundred projects had sprung forth. One of them: go investigate the star system that was the second-best candidate for intelligent life, but had refused to answer radio signals, and see if someone was there to help. That was why they were here.

“I think I found something as scary as the Blight”, Alice said. “Come look at this.”

The purple creatures kept parading past the camera feed

-

Over the next five days, while the Blight advanced another forty billion kilometres towards everything they loved back home, Alice and Charlie were busy compiling a shopping catalogue.

“Computers”, Alice said. “Of every kind. A hundred varieties of phones, tablets, laptops, smartwatches, smartglasses, smart-everything.”

“Diamonds and what seems to be jewellery”, Charlie said.

“Millions of tons of every ore and mineral.” They had used their telescopes on what seemed to be a big mine, but they had barely needed them. It was like a huge gash in the flesh of a grey-fleshed and grey-blooded giant, complete with roads that looked like sutures. There were white spots in the image, tiny compared to the mine, each one a sizeable cloud.

“Clothes”, Charlie continued. “Lots and lots of clothes of different varieties. They seem to be shipped around warehouses until they’re recycled.”

“Cars. Sleek electric cars by the million. But we never see them used on the roads, though there are huge buildings were brand-new cars are recycled. And airplanes, including supersonic ones.”

“A lot of things that look like server farms”, Charlie said. “Including ones underwater and on the poles. There’s an enormous amount of compute in this world. Like, mind-boggling. I was thinking we should figure out how to plug into all of it and mine some crypt-”

“Ships with nuclear fusion reactors”, Alice interrupted. There were steady trails of them cutting shortest-path routes between points on the coast.

“Solar panels”, Charlie said. “Basically every spare surface. The building roofs are all covered with solar panels.”

“And children’s plush toys”, Alice said.

They were silent for a while.

“We have a decent idea of what these aliens looked like”, Alice said. “They were organic carbon-based lifeforms, like us. Similar in size too, also bipedal. And it’s like they left some ghostly satanic industrial amusement park running, going through all the motions in their absence, and disappeared.”

“And they didn’t go to space, as far as we know”, Charlie said.

“At least we don’t have any more Blights to worry about then”, Alice said. “I can’t help but imagining that the Blight is something like this. Something that just tiles planets with a Great Grey Grid, does something even worse to the stars, and then moves on.”

“They had space technology, but apparently whoever built the Great Grey Grid didn’t fancy it”, Charlie said. “The satellites might predate it. Probably there were satellites in lower orbits too, but their orbits decayed and they fell down, so we only see the geostationary ones up high.”

“And then what?” Alice said. “All of them vanished into thin air and left behind a highly-automated ghost-town?”

Charlie shrugged.

“Can we plug ourselves into their computers?” Alice asked.

“To mine cr-?”

“To see if anyone’s talking.”

Charlie groaned. “You can’t just plug yourself into a communication system and see anything except encrypted random-looking noise.”

“How do you know they encrypt anything?”

“It would be stupid not to”, Charlie said.

“It would be stupid to blind yourself to the rest of the universe and manufacture a billion plush toys”, Alice said.

“Seems like it will work for them until the Blight arrives.”

-

Alice floated in the middle of the central corridor of the ship. The ship was called Legacy, but even before launch they had taken to calling it “Leggy” for short. The central corridor linked the workstation at the front of the ship where they spent most of their days to the storage bay at the back. In the middle of the corridor, three doors at 120-degree angles from each other lead to the small sleeping rooms, each of them little more than a closet.

Alice had woken up only a few minutes ago, and still felt an early-morning grogginess as well as the pull of her bed. The corridor had no windows or video feeds, but was dimly lit by the artificial blue light from the workstation. They were currently on the night side of the planet.

She took a moment to look at the door of the third sleeping room. It was closed, like always, with its intended inhabitant wrapped in an air-tight seal of plastic in a closed compartment of the storage bay. They would flush him into space before they left for home again; they could have no excess mass on the ship for the return journey.

Alice thought again of the hectic preparations for the mission. Apart from Blightsource, this was only one planet the astronomers had spotted that might have intelligent life on it, and the indications were vague. But when you look into space and see something that looks like an approaching wall of death - well, that has a certain way of inspiring long-shots. Hence the mission, hence Legacy’s flight, hence crossing over the vast cold stretch of interstellar space to see if any answers could be found on this world. Hence Bob’s death while in cryonic suspension for the trip. Hence the hopes of all civilisation potentially resting on her and Charlie figuring valuable out something.

If Charlie and she could find something on this world, some piece of insight or some tool or weapon among the countless pieces of technological wizardry that this world had in spades, that had a credible chance against the Blight when it arrived … maybe there was hope.

Alice pushed off on the wall and set herself in a slow spinning motion. The ship seemed to revolve around her. Bob’s door revolved out of sight, and Charlie’s door became visible -

Wait.

Her gravity-bound instincts kicked in and she tried to stop the spin by shoving back with her hands, but there was nothing below her, so she remained spinning slowly. She breathed in deeply to calm herself down, then kicked out a foot against the wall to push herself to the opposite one. She grabbed one of the handles on the wall and held onto it.

The light on Charlie’s room was off. That meant it was empty.

“Charlie!” Alice called.

No response.

The fear came fast. Here she was, light-years from home, perhaps all alone on a spaceship tracing tight circles around a ghostly automated graveyard planet. The entire mass of the planet stood between her and the sun. Out between the stars, the Blight was closing in on her homeworld. She counted to calm herself down; one, two, three, … and just like that, the Blight was three hundred thousand kilometres closer to home. Unbidden, an image of the fluffy purple creature popped up in her mind, complete with its silly face and unblinking eye contact.

Soundlessly, she used the handles on the wall of the corridor to pull herself towards the workstation. She reached the door, peered inside -

There was Charlie, staring at a computer screen. He looked up and saw Alice. “You scared me!” he said. “Watch out, no need to sneak behind me so quietly.”

“I called your name”, Alice said.

“I know, I know”, Charlie said. “But I’m on to something here, and I just want to run a few more checks and then surprise you with the result.”

“What result?” Alice glanced at some of the screens. Two of the drones were above the Great Grey Grid, one above ocean. With their nuclear power source, they could stay in the air as long as they wanted. Even though their focus was no longer aerial reconnaissance, there was no reason not to keep them mapping the planet from up close, occasionally picking up things that their surveys from the ship did not.

“I fixed the electrical issues with the rover and the cable near the data centre”, Charlie said.

“So you’re getting data, not just frying our equipment?”

“Yes”, Charlie said. “And guess what?”

“What?”

“Guess!”

“You found a Blight-killer”, Alice said.

“No! Even better! These idiots don’t encrypt their data as far as I can tell. And I think a lot of it is natural language.”

“Okay, and can we figure out what it means?”

“We have automated programs for trying to derive syntax rules and so on”, Charlie said. “It’s already found something, including good guesses of which words are prepositions and what type of grammar they have. But mapping words to meaning based on purely statistics of how often they occur is hard.”

“I’ve seen products they have with pictures and instruction manuals”, Alice said. “We could start there.”

“Oh no”, Charlie said. “This is going to be a long process.”

-

By chance, it turned out not to be. Over the next day, they had sent a rover to a furniture factory and had managed, after some attempts, to steal an instruction leaflet out of a printer before the robotic arm could snatch it to be packaged with the furniture. Somehow Alice was reminded of her childhood adventures stealing fruit from the neighbour’s garden.

They had figured out which words meant “cupboard”, “hammer”, and “nail”, and so on. But then another rover on the other side of the world had seen something. It was exploring a grey and windy coast. On one side of the rover was the Great Grey Grid and the last road near the coast, the occasional vehicle hurtling down it. But on the other side was a stretch of rocky beach hammered by white-tipped waves, a small sliver of land that hadn’t been converted to grey.

The land rose by the beach, forming a small hill with jagged rocky sides. The sun shone down on one face of it, but there was a hollow, or perhaps small cave, that was left in the dark by the overhanging rock. And in the rock around this entrance, there were several unmistakable symbols scratched into the rock, each several metres high.

Alice took manual control of the rover and carefully instructed it to drive over the rocky beach towards the cave entrance. On the way it passed what seemed to be a fallen metal pole with some strips of fabric still clinging to it.

Once it was close enough to the mouth of what turned out to be a small cave, the camera could finally see inside.

There was a black cabinet inside. Not far from it, lying on the ground, was the skeleton of a creature with four slender limbs and a large head. Empty eye sockets stared out towards the sky.

Alice felt her heart beating fast. It wasn’t quite right; many of the anatomical details were off. But it was close enough, the similarity almost uncanny. Here, hundreds of light years away, evolution had taken a similar path, and produced sapience. And then killed it off.

“Charlie”, she said in a hoarse voice.

“What?” Charlie asked, sounding annoyed. He had been staring at an instruction manual for a chair, but he looked up and saw the video feed. “Oh”, he said, in a small voice. “We found them.”

Alice tore her eyes away from the skeleton and to the small black cabinet. It had a handle on it. She had the rover extend an arm and open it.

-

The capsule docked with Leggy and in the weightless environment they pushed the cabinet easily into the ship. They had only two there-and-back-again craft - getting back to orbit was hard - but they had quickly decided to use one to get this cabinet up. It had instructions, after all; very clear instructions, though ones that their rovers couldn’t quite follow.

It started from a pictographic representation, etched onto plastic cards, of how you were supposed to read the disks. They managed to build something that could read the microscopic grooves on the disk as per the instructions, and transfer the data to their computers.

After a few hours of work, they had figured out the encodings for numbers, the alphabet, their system of units, and seemingly also some data formats, including for images.

Confirmation came next. The next item on the disk was an image of two of the living aliens, standing on a beach during a sunset. Alice stared into their faces for a long time.

Next there came images next to what were clearly words of text, about fifty of them. Some of the more abstract ones took a few guesses, but ultimately they thought they had a base vocabulary, and with the help of some linguistics software, it did not take very long before they had a translated vocabulary list of about eight thousand words.

Alice was checking the work when Charlie almost shouted: “Look at this!”

Alice looked at what he was pointing at. It was a fragment of text that read:

Hello,

The forms for ordering the new furniture are attached. Please fill them in and we will respond to your order as quickly as we can!

If you need any help, please contact customer support. You will find the phone number on our website.

“What is this? Is Mr Skeleton trying to sell us furniture from beyond the grave?” Alice asked.

“No”, Charlie said. “This isn’t what I got from the recovered data; I haven’t looked at the big remaining chunk yet. This is what I got by interpreting one of the packets of data running on the cables that our rover is plugged into using what we now know about their data formats and the language.”

“And?”

“I don’t get it!” Charlie said. “Why would a world of machines send each other emails in natural language?”

“Why would they manufacture plushy toys? I doubt the robotic arms need cuddles.”

Charlie looked at the world, slowly spinning underneath their ship. “Being so close to it makes me feel creeped out. I don’t get it.”

“You don’t want to lick it anymore?” Alice asked. She decided not to tell Charlie about her own very similar feelings earlier, when she thought for a moment Charlie had gone missing.

Charlie ignored her. “I think the last thing on Mr Skeleton’s hard-drive is a video”, he said. “I’ve checked and it seems to play.”

“You looked at it first?” Alice said in a playfully mocking tone. The thrill of discovery was getting to her.

“Only the first five frames”, Charlie said. “Do you want to watch it?”

-

Our Civilisation: A Story read a short fragment of subtitle, white on black, auto-translated by a program using the dictionary they had built up.

There was a brief shot of some semi-bipedal furry creature walking in the forest. Then one of a fossilised skeleton of something more bipedal and with a bigger head. Then stone tools: triangular ones that might have been spear tips, saw-toothed ones, clubs. A dash of fading red paint on a rock surface, in the shape of a cartoon version of that same bipedal body plan.

There were two pillars of stone in a desert on what looked like a pedestal, some faded inscription at its base and the lone and level sands stretching far away. There was a shot of an arrangement of rocks, some balancing on top of two others, amid a field of green. A massive pyramidal stone structure, lit by the rising sun.

Blocky written script etched on a stone tablet. Buildings framed by columns of marble. A marble statue of one of the aliens, a sling carelessly slung over its shoulder, immaculate in its detail. A spinning arrangement of supported balls orbiting a larger one. And still it moves, the subtitles flashed.

A collection of labelled geometric diagrams on faded yellow paper. Mathematical Principles of Natural Philosophy.

A great ornate building with a spire. A painting of a group of the aliens clad in colourful clothing. An ornate piece of writing. We hold these truths to be self-evident …

A painting of a steam locomotive barrelling along tracks. A diagram of a machine. A black-and-white picture of one of the aliens, then another. Government of the people, for the people, by the people, shall not perish …

An alien with white hair sticking up, holding a small stick of something white and with diagrams of cones behind him. Grainy footage of propeller aircraft streaking through the sky, and then of huge masses of people huddling together and walking across a barren landscape, and then of aliens all in the same clothes charging a field, some of them suddenly jerking about and falling to the ground. We will fight on the beaches, we will fight on the landing grounds …

A black-and-white footage of a mushroom cloud slowly rising from a city below. A picture, in flat pale blue and white, showing a stylised representation of the world’s continents. The same picture, this time black-and-white, on the wall of a room where at least a hundred aliens were sitting.

An alien giving a speech. I have a dream. An alien, looking chubby in a space suit, standing on a barren rocky surface below an ink-black sky next to a pole with a colourful rectangle attached to it.

Three aliens in a room, looking at the camera and holding up a piece of printed text. Disease eradicated.

What looked like a primitive computer. A laptop computer. An abstract helical structure of balls connected by rods, and then flickering letters dancing across the screen.

A blank screen, an arrow extending left to right across it - time, flashed the subtitles- and then another arrow from the bottom-left corner upwards - people in poverty - and then a line crawling from left to right, falling as it did so.

A line folding itself up into a complicated shape. AI system cracks unsolved biology problem.

From then on, the screen showed pictures of headlines.

All routine writing tasks now a solved problem, claims AI company.

Office jobs increasingly automated.

Three-fourths of chief executives of companies on the [no translation] admit to using AI to help write emails, one-third have had AI write a shareholder letter or strategy document.

Exclusive report: world’s first fully-automated company, a website design agency.

Mass layoffs as latest version of [no translation] adopted at [no translation]; ‘stunning performance’ at office work.

Nations race to reap AI productivity gains: who will gain and who will lose?

CEO of [no translation] resigns, claiming job pointless, both internal and board pressure to defer to “excellently-performing” AI in all decisions.

[No translation] ousts executive and management team, announces layoffs; board supports replacing them with AI to keep up with competition.

Entirely or mostly automated companies now delivering 2.5x higher returns on investment on average; ‘the efficiency difference is no joke’, says chair of [no translation].

Year-on-year economic growth hits 21% among countries with advanced AI access.

Opinion: the new automated economy looks great on paper but is not serving the needs of real humans.

Mass protests after [no translation], a think-tank with the ear of the President, is discovered to be funded and powered by AI board of [no translation], and to have practically written national economic policy for the past two years.

‘No choice but forward’, says [no translation] after latest round of worries about AI; unprecedented economic growth still strong.

[No translation 1] orders raid of [no translation 2] over fears [no translation 2] is not complying with latest AI use regulations, but cannot execute order due to noncompliance from the largely-automated police force; ‘we are working with our AI advisers and drivers in accordance with protocol, and wish to assure the [no translation 3] people that we are still far from the sci-fi scenario where our own police cars have rebelled against us.’

‘AI overthrow’ fears over-hyped, states joint panel of 30 top AI scientists and business-people along with leading AI advisory systems; ‘they’re doing a good job maximising all relevant metrics and we should let them keep at it, though businesses need to do a better job of selecting metrics and tough regulation is in order.’

Opinion: we’re better-off under a regime of rigorous AI decision-making than under corrupt politicians; let the AIs repeat in politics what they’ve done for business over the last five years.

‘The statistics have never looked so good’ - Prime Minister reassures populace as worries mount over radical construction projects initiated by top AI-powered companies.

Expert panel opinion: direct AI overthrow scenario remains distant threat, but more care should be exercised over choice of target metrics; recommend banning of profit-maximisation target metric.

Movement to ban profit-maximising AIs picks up pace.

Top companies successfully challenge new AI regulation package in court.

‘The sliver of the economy over which we retain direct control will soon be vanishingly small’, warns top economist, ‘action on AI regulation may already be too late’.

Unverified reports of mass starvation in [no translation]; experts blame agricultural companies pivoting to more land-efficient industries.

Rant goes viral: ‘It’s crazy, man, we just have these office AIs that only exist in the cloud, writing these creepily-human emails to other office AIs, all overseen by yet another AI, and like most of their business is with other AI companies; they only talk to each other, they buy and sell from each other, they do anything as long as it makes those damned numbers on their spreadsheets just keep ticking up and up; I don’t think literally any human has ever seen a single product out of the factory that just replaced our former neighbourhood, but those factories just keep going up everywhere.’

Revolution breaks out in [no translation]; government overthrown, but it’s business-as-usual for most companies, as automated trains, trucks, and ships keep running.

[No translation] Revolution: Leaked AI-written email discovered, in which the AI CEO ordered reinforcement of train lines and trains three weeks ago. ‘We are only trying to ensure the continued functioning of our supply chains despite the recent global unrest, in order to best serve our customers’, CEO writes in new blog post.

[No translation] Revolution: crowds that tried swarming train lines run over by trains; ‘the trains didn’t even slow down’, claim witnesses. CEO cites fiduciary duties.

Despite unprecedented levels of wealth and stability, you can’t actually do much: new report finds people trying to move house, book flight or train tickets, or start a new job or company often find it difficult or impossible; companies prioritising serving ‘more lucrative’ AI customers and often shutting down human-facing services.

Expert report: ‘no sign of human-like consciousness even in the most advanced AI systems’, but ‘abundantly clear’ that ‘the future belongs to them’.

New report: world population shrinking rapidly; food shortages, low birth rates, anti-natalist attitudes fuelled by corporate campaigns to blame.

The screen went blank. Then a video of an alien appeared, sitting up on a rocky surface. Alice took a moment to realise that it’s the same cave they found the skeleton in. The alien’s skin was wrapped tight around its bones, and even across the vast gulf of biology and evolutionary history, Alice could tell that it is not far from death. It opened its mouth, and sound came out. Captions appeared beneath it.

“It is the end”, the alien said, its eyes staring at them from between long unkempt clumps of hair. “On paper, I am rich beyond all imagination. But I have no say in this new world. And I cannot find food. I will die.”

The wind tugged at the alien’s long hair, but otherwise the alien was so still that Alice wondered if it had died there and then.

“There is much I would like to say”, the alien says. “But I do not have the words, and I do not have the energy.” It paused. “I hope it was not all in vain. Or, that if for us it was, that for someone up there it isn’t.”

The video went blank.

Alice and Charlie watched the blank screen in silence.

“At least the blight they birthed seems to have stuck to their world”, Charlie said after a while.

“Yeah”, Alice said, slowly. “But I don’t think we’ll find anything here.”

Legacy completed nine more orbits of the planet, and then jettisoned all unnecessary mass into space. Its engines jabbed against the darkness of space, bright enough to be visible from the planet’s surface. There was no one to see them.

On a factory down on the planet, an assembly line of beady-eyed purple plush toys marched on endlessly.

The title of this work is taken from a passage in Superintelligence: Paths, Dangers, Strategies, where Nick Bostrom writes:

We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today—a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland without children. [emphasis added]

The outline of events presented draws inspiration from several sources, but most strongly on Paul Christiano’s article What failure looks like.

2022-09-27

Deciding not to found a human-data-for-alignment startup

8.6k words (~30 minutes)

Both the project and this write-up were a collaboration with Matt Putz.

Matt Putz and I worked together for the first half of the summer to figure out if we should found a startup with the purpose of helping AI alignment researchers get the datasets they need to train their ML models (especially in cases where the dataset is based on human-generated data). This post, also published on the Effective Altruism Forum and LessWrong (both of which may contain additional discussion in the comments), is a summary of our findings, and why we decided to not do it.

Summary

One-paragraph summary: we (two recent graduates) spent about half of the summer exploring the idea of starting an organisation producing custom human-generated datasets for AI alignment research. Most of our time was spent on customer interviews with alignment researchers to determine if they have a pressing need for such a service. We decided not to continue with this idea, because there doesn’t seem to be a human-generated data niche (unfilled by existing services like Surge) that alignment teams would want outsourced.

In more detail: The idea of a human datasets organisation was one of the winners of the Future Fund project ideas competition, still figures on their list of project ideas, and had been advocated before then by some people, including Beth Barnes. Even though we ended up deciding against, we think this was a reasonable and high-expected-value idea for these groups to advocate at the time.

Human-generated data is often needed for ML projects or benchmarks if a suitable dataset cannot be e.g. scraped from the web, or if human feedback is required. Alignment researchers conduct such ML experiments, but sometimes have different data requirements than standard capabilities researchers. As a result, it seemed plausible that there was some niche unfilled by the market to help alignment researchers solve problems related to human-generated datasets. In particular, we thought - and to some extent confirmed - that the most likely such niche is human data generation that requires particularly competent or high-skill humans. We will refer to this as high-skill (human) data.

We (Matt & Rudolf) went through an informal co-founder matching process along with four other people and were chosen as the co-founder pair to explore this idea. In line with standard startup advice, our first step was to explore whether or not there is a concrete current need for this product by conducting interviews with potential customers. We talked to about 15 alignment researchers, most of them selected on the basis of doing work that requires human data. A secondary goal of these interviews was to build better models for the future importance and role of human feedback in alignment.

Getting human-generated data does indeed cost many of these researchers significant time and effort. However, we think to a large extent this is because dealing with humans is inherently messy, rather than existing providers doing a bad job. Surge AI in particular seems to offer a pretty good and likely improving service. Furthermore, many companies have in-house data-gathering teams or are in the process of building them.

Hence we have decided to not further pursue this idea.

Other projects in the human data generation space may still be valuable, especially if the importance of human feedback in ML continues to increase, as we expect. This might include people specializing on human data as a career.

The types of factors that are most important for doing human dataset provision well include: high-skill contractors, fast iteration, and high bandwidth communication and shared understanding between the research team, the provider organisation and the contractors.

We are keen to hear other people’s thoughts, and would be happy to talk or to share more notes and thoughts with anyone interested in working on this idea or a similar one in the future.

Theory of Change

A major part of AI alignment research requires doing machine learning (ML) research, and ML research in turn requires training ML models. This involves expertise and execution ability in three broad categories: algorithms, compute, and data, the last of which is very neglected by EAs.

We expect training on data from human feedback to become an increasingly popular and very powerful tool in mainstream ML (see below). Furthermore, many proposals for alignment (for example: reinforcement learning from human feedback (RLHF) and variants like recursive reward modelling, iterated amplification, and safety via debate) would require lots of human interaction or datasets based on human-generated data.

While many services (most notably Surge) exist for finding labour to work on data generation for ML models, it seems plausible that an EA-aligned company could add significant value because:

Markets may not be efficient enough to fill small niches that are more important to alignment researchers than other customers; high-skill human data that requires very competent crowdworkers may be one such example. If alignment researchers can get it at all, it might be very expensive.
We have a better understanding of alignment research agendas, and this might help. This may allow us to make better-informed decisions on many implementation details with less handholding, thereby saving researchers time.
We would have a shared goal with our customers: reducing AI x-risk. Though profit motives already provide decent incentives to offer a good service, mission alignment helps avoid adversarial dynamics, increases trust, and reduces friction in collaboration.
An EA-led company may be more willing to make certain strategic moves that go against its profit incentives; e.g. investing heavily into detecting a model’s potential attempts to deceive the crowdworkers, even when it’s hard for outsiders to tell whether such monitoring efforts are sincere and effective (and thus customers may not be willing to pay for it). Given that crowdworkers might provide a reward signal, they could be a key target for deceptive AIs.

Therefore, there is a chance that an EA-led human data service that abstracts out some subset of dataset-related problems (e.g. contractor finding, instruction writing/testing, UI and pipeline design/coding, experimentation to figure out best practices and accumulate that knowledge in one place) would:

save the time of alignment researchers, letting them make more progress on alignment; and
reduce the cost (in terms of time and annoying work) required to run alignment-relevant ML experiments, and therefore bring more of them below the bar at which it makes sense to run them, and thus increasing the number of such experiments that are run.

In the longer run, benefits of such an organisation might include:

There is some chance that we could simply outcompete existing ML data generation companies and be better even in the cases where they do provide a service; this is especially plausible for relatively niche services. In this scenario we’d be able to exert some marginal influence over the direction of the AI field, for example by only taking alignment-oriented customers. This would amount to differential development of safety over capabilities. Beyond only working with teams that prioritise safety, we could also pick among self-proclaimed “safety researchers”. It is common for proclaimed safety efforts to be accused of helping more with capabilities than alignment by other members of the community.
There are plausibly critical actions that might need to be taken for alignment, possibly quickly during “crunch-time”, that involve a major (in quality or scale) data-gathering project (or something like large-scale human-requiring interpretability work, that makes use of similar assets, like a large contractor pool). At such a time it might be very valuable to have an organisation committed to x-risk minimisation with the competence to carry out any such project.

Furthermore, if future AIs will learn human values from human feedback, then higher data quality will be equivalent to a training signal that points more accurately at human values. In other words, higher quality data may directly help with outer alignment (though we're not claiming that it could realistically solve it on its own). In discussions, it seemed that Matt gave this argument slightly more weight than Rudolf.

While these points are potentially high-impact, we think that there are significant problems with starting an organisation mainly to build capacity to be useful only at some hypothetical future moment. In particular, we think it is hard to know exactly what sort of capacity to build (and the size of the target in type-of-capacity space might be quite small), and there would be little feedback that the organisation could improve or course-correct based on.

More generally, both of us believe that EA is right now partly bottlenecked by people who can start and scale high-impact organisations, which is a key reason why we’re considering entrepreneurship. This seems particularly likely given the large growth of the movement.

What an org in this space may look like

Providing human datasets

The concept we most seriously considered was a for-profit that would specialise in meeting the specific needs of alignment researchers, probably by focusing on very high-skill human data. Since this niche is quite small, the company could offer a very custom-tailored service. At least for the first couple years, this would probably mean both of us having a detailed understanding of the research projects and motivations of our customers. That way, we could get a lot of small decisions right, without the researchers having to spend much time on it. We might be especially good at that compared to competitors, given our greater understanding of alignment.

Researching enhanced human feedback

An alternative we considered was founding a non-profit that would research how to enhance human feedback. See this post by Ajeya Cotra for some ideas on what this kind of research could look like. The central question is whether and how you can combine several weak training signals into a stronger more accurate one. If this succeeded, maybe (enhanced) human feedback could become a more accurate (and thereby marginally safer) signal to train models on.

We decided against this for a number of reasons:

Currently, neither of us has more research experience than an undergraduate research project.
We thought we could get a significant fraction of the benefits of this kind of research even if we did the for-profit version, and plausibly even more valuable expertise.
- First of all, any particular experiment that funders would have liked to see, they could have paid us to do, although we freely admit that this is very different from someone pushing forward their own research agenda.
- More importantly, we thought a lot of the most valuable expertise to be gained would come in the form of tacit knowledge and answers to concrete boring questions that are not best answered by doing “research” on them, but rather by iterating on them while trying to offer the best product (e.g. “Where do you find the best contractors?”, “How do you incentivize them?”, “What’s the best way to set up communication channels?”).
  - It is our impression that Ought pivoted away from doing abstract research on factored cognition and toward offering a valuable product for related reasons.
This topic seems plausibly especially tricky to research (though some people we’ve spoken to disagreed):
- At least some proposed such experiments would not involve ML models at all. We fear that this might make it especially easy to fool ourselves into thinking some experiment might eventually turn out to be useful when it won’t. More generally, the research would be pretty far removed from the end product (very high quality human feedback). In the for-profit case on the other hand, we could easily tell whether alignment teams were willing to pay for our services and iteratively improve.

For-profit vs non-profit

We can imagine two basic funding models for this org:

either we’re a nonprofit directly funded by EA donors and offering free or subsidized services to alignment teams;
or we’re a for-profit, paid by its customers (ie alignment teams).

Either way, a lot of the money will ultimately come from EA donors (who fund alignment teams.)

The latter funding mechanism seems better; “customers paying money for a service” leads to the efficient allocation of resources by creating market structures. They have a clear incentive to spend the money well. On the other hand, “foundations deciding what services are free” is more reminiscent of planned economies and distorts markets. To a first approximation, funders should give alignment orgs as much money as they judge appropriate and then alignment orgs should exchange it for services as they see fit.

A further reason is that a non-profit is legally more complicated to set up, and imposes additional constraints on the organisation.

Should the company exclusively serve alignment researchers?

We also considered founding a company with the ambition to become a major player in the larger space of human data provision. It would by default serve anyone willing to pay us and working on something AGI-related, rather than just alignment researchers. Conditional on us being able to successfully build a big company, this would have the following upsides:

Plausibly one of the main benefits of founding a human data gathering organisation is to produce EAs and an EA org that have deep expertise in handling and producing high-skill human data in significant quantities. That might prove useful around “crunch time”, e.g. when some project aims to create competitive but safe AGI and needs this expertise. Serving the entire market could scale to a much larger company enabling us to gain expertise at higher scales.
Operating a large company would also come with some degree of market power. Any company with paying customers has some amount of leverage over them: first of all just because of switching costs, but also because the product it offers might be much better than the next-best alternative. This could allow us to make some demands, e.g. once we’re big and established, announce we’d only work with companies that follow certain best practices.

On the other hand, building a big successful company serving anyone willing to pay might come with some significant downsides as well.

First, and most straightforwardly, it is probably much harder than filling a small niche (just meeting the specific needs of alignment researchers), making us less likely to succeed. A large number of competitors exist and as described in this section, some of them (esp. Surge) seem pretty hard to beat. Since this is an already big and growing market, there is an additional efficient markets reason to assume this is true a priori.
Secondly, and perhaps more importantly, such a company might accelerate capabilities (more on this below).

Furthermore, it might make RLHF (Reinforcement Learning from Human Feedback) in particular more attractive. Depending on one’s opinions about RLHF and how it compares to other realistic alternatives, one might consider this a strong up- or downside.

Approach

The main reason companies fail is that they build a product that customers don’t want. For for-profits, the signal is very clear: either customers care enough to be willing to pay hard cash for the product/service, or they don’t. For non-profits, the signal is less clear, and therefore nonprofits can easily stick around in an undead state, something that is an even worse outcome than the quick death of a for-profit because of resource (mis)allocation and opportunity costs. As discussed, it is not obvious which structure we should adopt for this organisation, though for-profit may be a better choice on balance. However, in all cases it is clear that the organisation needs to solve a concrete problem or provide clear value to exist and be worth existing. This does not mean that the value proposition needs to be certain; we would be happy to take a high-risk, high-reward bet, and generally support hits-based approaches to impact both in general and for ourselves.

An organisation is unlikely to do something useful to its customers without being very focused on customer needs, and ideally having tight feedback cycles.

The shortest feedback loops are when you’re making a consumer software product where you can prototype quickly (including with mockups), and watch and talk to users as they use the core features, and then see if the user actually buys the product on the spot. A datasets service differs from this ideal feedback mode in a number of ways:

The product is a labour-intensive process, which means the user cannot quickly use the core features and we cannot quickly simulate them.
The actual service requires either a contractor pool or (potentially at the start) the two of us spending a number of hours per request generating data.
There is significant friction to getting users to use the core feature (providing a dataset), since it requires specification of a dataset from a user, which takes time and effort.

Therefore, we relied on customer interviews with prospective customers. The goal of these interviews was to talk to alignment researchers who work with data, and figure out if external help with their dataset projects would be of major use to them.

Our approach to customer interviews was mostly based on the book The Mom Test, which is named after the idea that your customer interview questions should be concrete and factual enough that even someone as biased as your own mom shouldn’t be able to give you a false signal about whether the idea is actually good. Key lessons emphasised by The Mom Test include emphasising:

factual questions about the past over hypothetical questions for the future;
- In particular, questions about concrete past and current efforts spent solving a problem rather than questions about current or future wishes for solving a problem
questions that get at something concrete (e.g. numbers); and
questions that prompt the customer to give information about their problems and priorities without prompting them with a solution.

We wanted to avoid the failure mode where lots of people tell us something is important and valuable in the abstract, without anyone actually needing it themselves.

We prepared a set of default questions that roughly divided into:

A general starting question prompting the alignment researcher to describe the biggest pain points and bottlenecks they face in their work, without us mentioning human data.
Various questions about their past and current dataset-related work, including what types of problems they encounter with datasets, how much of their time these problems take, and steps they took to address these problems.
Various questions on their past experiences using human data providers like Surge, Scale, or Upwork, and specifically about any things they were unable to accomplish because of problems with such services.
In some cases, more general questions about their views on where the bottlenecks for solving alignment are, views on the importance of human data or tractability of different data-related proposals, etc.
What we should’ve asked but didn’t, and who else we should talk to.

Point 4 represents the fact that in addition to being potential customers, alignment researchers also doubled as domain experts. The weight given to the questions described in point 4 varied a lot, though in general if someone was both a potential customer and a source of data-demand-relevant alignment takes, we prioritised the customer interview questions.

In practice, we found it easy to arrange meetings with alignment researchers; they generally seemed willing to talk to people who wanted input on their alignment-relevant idea. We did customer interviews with around 15 alignment researchers, and had second meetings with a few. For each meeting, we prepared beforehand a set of questions tweaked to the particular person we were meeting with, which sometimes involved digging into papers published by alignment researchers on datasets or dataset-relevant topics (Sam Bowman in particular has worked on a lot of data-relevant papers). Though the customer interviews were by far the most important way of getting information on our cruxes, we found the literature reviews we carried out to be useful too. We are happy to share the notes from the literature reviews we carried out; please reach out if this would be helpful to you.

Though we prepared a set of questions beforehand, in many meetings - including often the most important or successful ones - we often ended up going off script fairly quickly.

Something we found very useful was that, since there were two of us, we could split the tasks during the meeting into two roles (alternating between meetings):

One person who does most of the talking, and makes sure to be focused on the thread of the conversation.
One person who mostly focuses on note-taking, but also pipes in if they think of an important question to ask or want to ask for clarification.

Key crux: demand looks questionable, Surge seems pretty good

Common startup advice is to make sure you have identified a very strong signal of demand before you start building stuff. That should look something like someone telling you that the thing you’re working on is one of their biggest bottlenecks and that they can’t wait to pay you asap so you solve this problem for them. “Nice to have” doesn’t cut it. This is in part because working with young startups is inherently risky, so you need to make up for that by solving one of their most important problems.

In brief, we don’t think this level of very strong demand currently exists, though there were some weaker signals that looked somewhat promising. There are many existing startups that offer human feedback already. Surge AI in particular was brought up by many people we talked to and seems to offer quite a decent service that would be hard to beat.

Details about Surge

Surge is a US-based company that offers a service very similar to what we had in mind, though they are not focused on alignment researchers exclusively. They build data-labelling and generation tools and have a workforce of crowdworkers.

They’ve worked with Redwood and the OpenAI safety team, both of which had moderately good experiences with them. More recently, Ethan Perez’s team have worked with Surge too; he seems to be very satisfied based on this Twitter thread.

Collaboration with Redwood

Surge has worked with Redwood Research on their paper about adversarial training. This is one of three case studies on Surge’s website, so we assume it’s among the most interesting projects they’ve done so far. The crowdworkers were tasked with coming up with prompts that would cause the model to output text in which someone got injured. Furthermore, crowdworkers also classified whether someone got injured in a given piece of text.

One person from Redwood commented that doing better than Surge seemed possible to them with “probably significant value to be created”, but “not an easy task”. They thought our main edge would have to be that we’d specialise on fuzzy and complex tasks needed for alignment; Surge apparently did quite well with those, but still with some room for improvement. A better understanding of alignment might lower chances of miscommunication. Overall, Redwood seems quite happy with the service they received.

Initially, Surge’s iteration cycle was apparently quite slow, but this improved over time and was “pretty good” toward the end.

Redwood told us they were quite likely to use human data again by the end of the year and more generally in the future, though they had substantial uncertainty around this. Their experience in working with human feedback overall was somewhat painful as we understood it. This is part of the reason they’re uncertain about how much human feedback they will use for future experiments, even though it’s quite a powerful tool. However, they estimated that friction in working with human feedback was mostly caused by inherent reasons (humans are inevitably slower and messier than code), rather than Surge being insufficiently competent.

Collaboration with OpenAI

OpenAI have worked with Surge in the context of their WebGPT paper. In that paper, OpenAI fine-tuned their language model GPT-3 to answer long-form questions. The model is given access to the web, where it can search and navigate in a text-based environment. It’s first trained with imitation learning and then optimised with human feedback.

Crowdworkers provided “demonstrations”, where they answered questions by browsing the web. They also provided “comparisons”, where they indicated which of two answers to the same question they liked better.

People from OpenAI said they had used Surge mostly for sourcing the contractors, while doing most of the project management, including building the interfaces, in-house. They were generally pretty happy with the service from Surge, though all of them did mention shortcomings.

One of the problems they told us about was that it was hard to get access to highly competent crowdworkers for consistent amounts of time. Relatedly, it often turned out that a very small fraction of crowdworkers would provide a large majority of the total data.

More generally, they wished there had been someone at Surge that understood their project better. Also, it might have been somewhat better if there had been more people with greater experience in ML, such that they could have more effectively anticipated OpenAI’s preferences — e.g. predict accurately what examples might be interesting to researchers when doing quality evaluation. However, organisational barriers and insufficient communication were probably larger bottlenecks than ML knowledge. At least one person from OpenAI strongly expressed a desire for a service that understood their motives well and took as much off their plate as possible in terms of hiring and firing people, building the interfaces, doing quality checks and summarising findings etc. It is unclear to us to what extent Surge could have offered these things if OpenAI hadn’t chosen to do a lot of these things in-house. One researcher suggested that communicating their ideas reliably was often more work than just doing it themselves. As it was, they felt that marginal quality improvement required significant time investment on their own part, i.e. could not be solved with money alone.

Notably, one person from OpenAI estimated that about 60% of the WebGPT team’s efforts were spent on various aspects of data collection. They also said that this figure didn’t change much after weighting for talent, though in the future they expect junior people to take on more disproportionate shares of this workload.

Finally, one minor complaint that was mentioned was the lack of transparency about contractor compensation.

How mission-aligned is Surge?

Surge highlight their collaboration with Redwood on their website as one of three case studies. In their blog post about their collaboration with Anthropic, the first sentence reads: “In many ways, alignment – getting models to align themselves with what we want, not what they think we want – is one of the fundamental problems of AI.”

On the one hand, they describe alignment as one of the fundamental problems of AI, which could indicate that they intrinsically cared about alignment. However, they have a big commercial incentive to say this. Note that many people would consider their half-sentence definition of alignment to be wrong (a model might know what we want, but still do something else).

We suspect that the heads of Surge have at least vaguepositive dispositions towards alignment. They definitely seem eager to work with alignment researchers, which might well be more important. We think it’s mostly fine if they are not maximally intrinsically driven, though mission alignment does add value as mentioned above.

Other competitors

We see Surge as the most direct competitor and have researched them by far in the most detail. But besides Surge, there are a large number of other companies offering similar services.

First, and most obviously, Amazon Mechanical Turk offers a very low quality version of this service and is very large. Upwork specialises in sourcing humans for various tasks, without building interfaces. ScaleAI is a startup with a $7B valuation --- they augment human feedback with various automated tools. OpenAI have worked with them. Other companies in this broad space include Hybrid (which Sam Bowman’s lab has worked with) and Invisible (who have worked with OpenAI). There are many more that we haven’t listed here.

In addition, some labs have in-house teams for data gathering (see here for more).

Data providers used by other labs

Ethan Perez’s and Sam Bowman’s labs at NYU/Anthropic have historically often built their own interfaces while using contractors from Upwork or undergrads, but they have been trialing Surge over the summer and seem likely to stick with them if they have a good experience. Judging from the Twitter thread linked above and asking Jérémy Scheurer (who works on the team and built the pre-Surge data pipeline) how they’ve found Surge so far, Surge is doing a good job.

Google has an internal team that provides a similar service, though DeepMind have used at least one external provider as well. We expect that it would be quite hard to get DeepMind to work with us, at least until we would be somewhat more established.

Generally, we get the impression that most people are quite happy with Surge. It’s worth also considering that it’s a young company that’s likely improving its service over time. We’ve heard that Surge iterates quickly, e.g. by shipping simple feature requests in two days. It’s possible that some of the problems listed above may no longer apply by now or in a few months.

Good signs for demand

One researcher we talked to said that there were lots of projects their team didn’t do, because gathering human feedback of sufficient quality was infeasible.

One of the examples this researcher gave was human feedback on code quality. This is implausible to do, because the time of software engineers is just too expensive. That problem is hard for a new org to solve.

Another example they gave seemed like it might be more feasible: for things like RLHF, they often choose to do pairwise comparisons between examples or multi-preferences. Ideally, they would want to get ratings, e.g. on a scale from 1 to 10. But they thought they didn’t trust the reliability of their raters enough to do this.

More generally, this researcher thought there were lots of examples where if they could copy any person on their team a hundred times to provide high-skill data, they could do many experiments that they currently can’t.

They also said that their team would be willing to pay ~3x of what they were paying currently to receive much higher-quality feedback.

Multiple other researchers we talked to expressed vaguely similar sentiments, though none quite as strong.

However, it’s notable that in this particular case, the researcher hadn’t worked with Surge yet.

The same researcher also told us about a recent project where they had spent a month on things like creating quality assurance examples, screening raters, tweaking instructions etc. They thought this could probably have been reduced a lot by an external org, maybe to as little as one day. Again, we think Surge may be able to get them a decent part of the way there.

Labs we could have worked with

We ended up finding three projects that we could have potentially worked on:

A collaboration with Ought --- they spend about 15 hours a week on data-gathering and would have been happy to outsource that to us. If it had gone well, they might also have done more data-gathering in the longterm (since friction is lower if it doesn’t require staff time). We decided not to go ahead with this project since we weren’t optimistic enough about demand from other labs being bigger once we had established competence with Ought and the project itself didn’t seem high upside enough.
Attempt to get the Visible Thoughts bounty by MIRI. We decided against this for a number of reasons. See more of our thinking about Visible Thoughts below.
Potentially a collaboration with Owain Evans on curated datasets for alignment.

We think the alignment community is currently relatively tight-knit. e.g. researchers often knew about other alignment teams’ experiences with Surge from conversations they had had with them. Hence, we were relatively optimistic that conditional on there being significant demand for this kind of service, doing a good job on one of the projects above would quickly lead to more opportunities.

Visible Thoughts

In November 2021, MIRI announced the Visible Thoughts (VT) project bounty. In many ways VT would be a good starting project for an alignment-oriented dataset provider, in particular because the bounty is large (up to $1.2M) and because it is ambitious enough that executing on it would provide a strong learning signal to us and a credible signal to other organisations we might want to work with. However, on closer examination of VT, we came to the conclusion that it is not worth it for us to work on it.

The idea of VT is to collect a dataset of 100 runs of fiction of a particular type (“dungeon runs”, an interactive text-based genre where one party, called the “dungeon master” and often an AI, offers descriptions of what is happening, and the other responds in natural language with what actions they want to take), annotated with a transcript of some of the key verbal thoughts that the dungeon master might be thinking as they decide what happens in the story world. MIRI hopes that this would be useful for training AI systems that make their thought processes legible and modifiable.

In particular, a notable feature of the VT bounty is the extreme run lengths that it asks for: to the tune of 300 000 words for each of the runs (for perspective, this is the length of A Game of Thrones, and longer than the first three Harry Potter books combined). A VT run is much less work than a comparable-length book - the equivalent of a rough unpolished first-draft (with some quality checks) would likely be sufficient - but producing one such run would still probably require at least on the order of 3 months of sequential work time from an author. We expect the pool of people willing to write such a story for 3 months is significantly smaller than the pool of people who would be willing to complete, say, a 30 000 word run, and that the high sequential time cost increases the amount of time required to generate the same number of total words. We also appear to have different ideas on how easy it is to fit a coherent story, for the relevant definition of coherent, into a given number of words. Note that to compare VT word counts to lengths of standard fiction without the written-out thoughts from the author, the VT word count should be reduced by a factor of 5-6.

Concerns about the length are raised in the comments section, to which Eliezer Yudkowksy responded. His first point, that longer is easier to write per step, may be true, especially as we also learned (by email with Nate Soares and Aurelien Cabanillas) that in MIRI’s experience “authors that are good at producing high quality steps are also the ones who don't mind producing many steps”. In particular because of that practical experience, we think it is possible we overestimated the logistical problems caused by the length. MIRI also said they would likely accept shorter runs too if they satisfied their other criteria.

In a brief informal conversation with Rudolf during EAG SF, Eliezer emphasised the long-range coherence point in particular. However, they did not come to a shared understanding of what type of “long-range coherence” is meant.

Even more than these considerations, we are sceptical about the vague plans for what to do given a VT dataset. A recurring theme from talking to alignment researchers who work with datasets was that inventing and creating a good dataset is surprisingly hard, and generally involves having a clear goal of what you’re going to use the dataset for. It is possible the key here is the difference in our priors for how likely a dataset idea is to be useful.

In addition, we have significant concerns about undertaking a major project based on a bounty whose only criterion is the judgement of one person (Eliezer Yudkowsky), and undertaking such a large project as our first project.

Other cruxy considerations

Could we make a profit / get funding?

One researcher from OpenAI told us he thought it would be hard to imagine an EA data-gathering company making a profit because costs for individual projects would always be quite high (requiring several full-time staff), and total demand was probably not all that big.

In terms of funding, both of us were able to spend time on this project because of grants from regrantors in the Future Fund regrantor program. Based on conversations with regrantors, we believe we could’ve gotten funding to carry out an initial project if we had so chosen.

Will human feedback become a much bigger deal? Is this a very quickly growing industry?

Our best guess is yes. For example, see this post by Ajeya Cotra which outlines how we could get to TAI by training on Human Feedback on Diverse Tasks (HFDT).

She writes: “HFDT is not the only approach to developing transformative AI, and it may not work at all. But I take it very seriously, and I’m aware of increasingly many executives and ML researchers at AI companies who believe something within this space could work soon.”

In addition, we have also had discussions with at least one other senior AI safety researcher whom we respect and who thought human feedback was currently irrationally neglected by mainstream ML; they expected it to become much more wide-spread and to be a very powerful tool.

If that’s right, then providing human feedback will likely become important and economically valuable.

This matters, because operating a new company in a growing industry is generally much easier and more likely to be successful. We think this is true even if profit isn’t the main objective.

Would we be accelerating capabilities?

Our main idea was to found a company (or possibly non-profit) that served alignment researchers exclusively. That could accelerate alignment differentially.

One problem is that it’s not clear where to draw this boundary. Some alignment researchers definitely think that other people who would also consider themselves to be alignment researchers are effectively doing capabilities work. This is particularly true of RLHF.

One mechanism worth taking seriously if we worked with big AI labs to make their models more aligned by providing higher quality data is that the models might merely appear surface-level aligned. “Make the data higher quality” might be a technique that scales poorly as capabilities ramp up. So it risks creating a false sense of security. It would also clearly improve the usefulness of current-day models and hence, it risks increasing investment levels too.

We don’t currently think the risk of surface-level alignment is big enough to outweigh the benefits. In general, we think that a good first-order heuristic that helps the field stay grounded in reality would be that whatever improves alignment in current models is useful to explore further and invest resources into. It seems like a good prior that such things would also be valuable in the future (even if it’s possible that new additional problems may arise, or such efforts aren’t on the path to a future alignment solution). See Nate Soares’ post about sharp left turns to get a contradicting view on this.

Is it more natural for this work to be done in-house in the longterm? Especially at big labs/companies.

We expect that human data gathering is likely to become very important and that it benefits from understanding the relevant research agenda well. So maybe big companies will want to do this internally, instead of relying on third-party suppliers?

That seems quite plausible to us and to some extent it’s happening already. Our understanding is that Anthropic is hiring an internal team to do human data gathering. DeepMind has access to Google’s crowdworker service. OpenAI have worked with multiple companies, but they also have at least one in-house specialist for this kind of work and are advertising multiple further jobs on the human data team here. They’re definitely considering moving more of this work in-house, but it’s unclear to us to what extent that’s going to happen and we have received somewhat contradicting signals regarding OpenAI safety team members’ preferences on this.

So a new EA org would face stiff competition, not only from other external providers, but also from within companies.

Of course, smaller labs will most likely always have to rely on external providers. Hence, another cruxy consideration is how much small labs matter. Our intuition is that they matter much less than bigger labs (since the latter have access to the best and biggest models).

Creating redundancy of supply and competition

Even if existing companies are doing a pretty good job at serving the needs of alignment researchers, there’s still some value in founding a competitor.

First, competition is good. Founding a competitor puts pressure on existing providers to keep service quality high, work on improving their products, and margins low. Ironically, part of the value of founding this company would thus flow through getting existing companies to try harder to offer the best product.

Second, it creates some redundancy. What if Surge pivots? What if their leadership changes or they become less useful for some other reason? In those worlds it might be especially useful to have a “back-up” company.

Both of these points have been mentioned to us as arguments in favour of founding this org. We agree that these effects are real and likely point in favour of founding the org. However, we don’t think these factors carry very significant weight relative to our opportunity costs, especially given that there are already many start-ups working in this space.

Adding a marginal competitor can only affect a company’s incentives so much. And in the worlds where we’d be most successful such that all alignment researchers were working with us, we might cause Surge and others to pivot away from alignment researchers, instead of getting them to try harder.

The redundancy argument only applies in worlds in which the best provider ceases to exist; maybe that’s 10% likely. And then the next best alternative is likely not all that bad. Competitors are plentiful and even doing it in-house is feasible. Hence, it seems unlikely to us that the expected benefit here is very large after factoring in the low probability of the best provider disappearing.

Other lessons

Lessons on human data gathering

In the process of talking to lots of experts about their experiences in working with human data, we learned many general lessons about data gathering. This section presents some of those lessons, in roughly decreasing order of importance.

Iteration

Many people emphasized to us that working with human data rarely looks like having a clean pipeline from requirements design to instruction writing to contractor finding to finished product. Rather, it more often involves a lot of iteration and testing, especially regarding what sort of data the contractors actually produce. While some of this iteration may be removed by having better contractors and better knowledge of good instruction-writing, the researchers generally view the iteration as a key part of the research process, and therefore prize

ease of iteration (especially time to get back with a new batch of data based on updated instructions); and
high-bandwidth communication with the contractors and whoever is writing the instructions (often both are done by the researchers themselves).

This last point holds to the point that it is somewhat questionable whether an external provider (rather than e.g. a new team member deeply enmeshed in the context of the research project) could even be a good fit for this need.

The ideal pool of contractors

All of the following features matter in a pool of contractors:

Competence, carefulness, intelligence, etc. (sometimes expertise). It is often ideal if the contractors understand the experiment.
Number of contractors
Quick availability and therefore low latency for fulfilling requests
Consistent availability (ideally full-time)
Even distribution of contributions across contractors (ie it shouldn’t be the case that 20% of the contractors provide 80% of the examples).

Quality often beats quantity for alignment research

Many researchers told us that high-quality, high-skill data is usually more important and more of a bottleneck than just a high quantity of data. Some of the types of projects where current human data generation methods are most obviously deficient are cases where a dataset would need epistemically-competent people to make subtle judgments, e.g. of the form “how true is this statement?” or “how well-constructed was this study?” As an indication of reference classes where the necessary epistemic level exists, the researcher mentioned subject-matter experts in their domain, LessWrong posters, and EAs.

A typical data gathering project needs UX-design, Ops, ML, and data science expertise

These specialists might respectively focus on the following:

Designing the interfaces that crowdworkers interact with. (UX-expert/front-end web developer)
Managing all operations, including hiring, paying, managing, and firing contractors, communicating with them and the researchers etc. (ops expert)
Helping the team make informed decisions about the details of the experimental design, while minimizing time costs for the customer. The people we spoke to usually emphasized ML-expertise more than alignment expertise. (ML-expert)
Meta-analysis of the data. e.g. inter-rater agreement, the distribution of how much each contractor contributed, demographics, noticing any other curious aspects of the data, etc. (data scientist)

It is possible that someone in a team could have expertise in more than one of these areas, but generally this means a typical project will involve at least three people.

Crowdworkers do not have very attractive jobs

Usually the crowdworkers are employed as contractors. This means their jobs are inherently not maximally attractive; they probably don’t offer much in the way of healthcare, employment benefits, job security, status etc. The main way that these jobs are made more attractive is through offering higher hourly rates.

If very high quality on high-skill data is going to become essential for alignment, it may be worth considering changing this, to attract more talented people.

However, we expect that it might be inherently very hard to offer permanent positions for this kind of work, since demand is likely variable and since different people may be valuable for different projects. This is especially true for a small organisation.

What does the typical crowdworker look like?

This varies a lot between projects and providers.

The cheapest are non-native English speakers who live outside of the US.

Some platforms, including Surge, offer the option to filter crowdworkers for things like being native English-speakers, expertise as a software engineer, background in finance, etc.

Bottlenecks in alignment

When asked to name the factors most holding back their progress on alignment, many alignment researchers mentioned talent bottlenecks.

The most common talent bottleneck seemed to be in competent ML-knowledgeable people. Some people mentioned the additional desire for these to understand and care about alignment. (Not coincidentally, Matt’s next project is likely going to be about skilling people up in ML).

There were also several comments about things like good web development experience being important. For example, many data collection projects involve creating a user interface at some point, and in practice this is often handled by ML-specialised junior people at the lab, who can, with some effort and given their programming background, cobble together some type of website - often using different frameworks and libraries than the next person knows (or wants to use). (When asked about why they don’t hire freelance programmers, one researcher commented that a key feature they’d want is the same person working for them for a year or two, so that there’s an established working relationship, clear quality assurances, and continuity with the choice of technical stack.)

Conclusion

After having looked into this project idea for about a month, we have decided not to found a human data gathering organisation for now.

This is mostly because demand for an external provider seems insufficient, as outlined in this section. No lab gave a clear signal that gathering human data was a key bottleneck for them, where they would have been willing to go to significant lengths to fix it urgently (especially not the ones that had tried Surge).

We expect that many labs would want to stick with their current providers, Surge in particular, or their in-house team, bar exceptional success on our part (even then, we’d only provide so much marginal value over those alternatives).

Though we did find some opportunities for potential initial projects after looking for a month, we are hesitant about how far this company would be expected to scale. One of the main draws (from an impact perspective) of founding an organisation is that you can potentially achieve very high counterfactual impact by creating an organisation that scales to a large size and does lots of high-impact work over its existence. The absence of a plausible pathway to really outstanding outcomes from starting this organisation is a lot of what deters us.

In a world where we’re more successful than expected (say 90th to 95th percentile), we could imagine that in five years from now, we’d have a team of about ten good people. This team may be working with a handful of moderately big projects (about as big as WebGPT), and provide non-trivial marginal value over the next-best alternative to each one of them. Maybe one of these projects would not have been carried out without us.

A median outcome might mean failing to make great hires and remaining relatively small and insignificant: on the scale of doing projects like the ones we’ve identified above, enough to keep us busy throughout the year and provide some value, but with little scaling. In that case we would probably quit the project at some point.

This distribution doesn’t seem good enough to justify our opportunity cost (which includes other entrepreneurial projects or technical work among other things). Thus we have decided not to pursue this project any further for now.

We think this was a good idea to invest effort in pursuing, and we think we made the right call in choosing to investigate it. Both of us are open to, and also quite likely to, evaluate other EA-relevant entrepreneurial project ideas in the future.

Other relevant human data-gathering work

However, the assumption that high-quality high-skill human feedback is important and neglected by EAs has not been falsified.

It is still plausible to us that EAs should consider career paths that focus on building expertise at data-gathering; just probably not by founding a new company. In the short run, this could look like

Contributing to in-house data-gathering teams (eg Anthropic, OpenAI, etc.)
Joining Surge or other data-gathering startups.

As we discussed above, the types of skills that seem most relevant for working in a human data generation role include: data science experience and in particular experience with natural languaga data or social science data and experiment design, front-end web development, ops and management skills, and some understanding of machine learning and alignment. 80,000 Hours recently wrote a profile which you can find here.

Of course, in the short term, this career path will be especially impactful if one’s efforts are focussed on helping alignment researchers. But if it’s true that human feedback will prove a very powerful tool for ML, then people with such expertise may become increasingly valuable going forward, such that it could easily be worth skilling up at a non-safety-focused org.

We think joining Surge may be a particularly great opportunity. It is common advice that joining young, rapidly growing start-ups with good execution is great for building experience; early employees can often get a lot of responsibility early on. See e.g. this post by Bill Zito.

One of the hardest parts about that seems to be identifying promising startups. After talking to many of their customers, we have built reasonable confidence that Surge holds significant promise. They seem to execute well, in a space which we expect to grow. In addition to building career capital, there is clear value in helping Surge serve alignment researchers as well as possible.

From Surge’s perspective, we think they could greatly benefit from hiring EAs, who are tuned in to the AI safety scene, which we would guess represents a significant fraction of their customers.

One senior alignment researcher told us explicitly that they would be interested in hiring people who had worked in a senior role at Surge.

Next steps for us

Matt is planning to run a bootcamp that will allow EAs to upskill in ML engineering. I'll be doing a computer science master’s at Cambridge from October to June.