// @flow

import * as React from "react";
import { Link } from "react-router-dom";

import Presentation from "../../components/Presentation/Presentation";
import Page from "../../components/Page/Page";

const FactoredCognitionMay2018 = () => (
  <Page
    className="FactoredCognitionMay2018"
    title="Factored Cognition (May 2018)"
    next={{
      url: "/research/factored-cognition",
      title: "Factored Cognition",
      description: "Project overview"
    }}
    header={{
      description:
        "A presentation of Ought's approach to automating human reasoning using ML.",
      imageURL: "https://ought.org/images/presentation-fc-2018-05-preview.png"
    }}
  >
    <Presentation
      imageURL={index =>
        `/images/presentations/2018-factored-cognition/2018-factored-cognition.${index
          .toString()
          .padStart(3, "0")}.jpeg`
      }
      skip={[44, 46, 47]}
      minIndex={1}
      maxIndex={54}
      showSlideNumbers={true}
      titles={{}}
      toc={false}
      annotations={{
        "1": (
          <p>
            I'll talk about <em>Factored Cognition</em>, our current main
            project at Ought. This is joint work with Ozzie Gooen, Ben Rachbach,
            Andrew Schreiber, Ben Weinstein-Raun, and (as board members) Paul
            Christiano and Owain Evans.
          </p>
        ),

        "2": (
          <div>
            <p>
              Before I get into the details of the project, I want to talk about
              the broader research program that it is part of. And to do that, I
              want to talk about research programs for AGI more generally.
            </p>
            <p>
              Right now, the dominant paradigm for researchers who explicitly
              work towards AGI is what you could call "scalable learning and
              planning in complex environments". This paradigm substantially
              relies on training agents in simulated physical environments to
              solve tasks that are similar to the sorts of tasks animals and
              humans can solve, sometimes in isolation and sometimes in
              competitive multi-agent settings.
            </p>
            <p>
              To be clear, not all tasks are physical tasks. There's also
              interest in more abstract environments as in the case of playing
              Go, proving theorems, or participating in goal-based dialog.
            </p>
          </div>
        ),

        "3": (
          <p>
            For our purposes, the key characteristic of this research paradigm
            is that agents are optimized for success at particular tasks. To the
            extent that they learn particular decision-making strategies, those
            are learned implicitly. We only provide external supervision, and it
            wouldn't be entirely wrong to call this sort of approach
            "recapitulating evolution", even if this isn't exactly what is going
            on most of the time.
          </p>
        ),

        "4": (
          <div>
            <p>
              As many people have pointed out, it could be difficult to become
              confident that a system produced through this sort of process is
              aligned - that is, that all its cognitive work is actually
              directed towards solving the tasks it is intended to help with.
              The reason for this is that alignment is a property of the
              decision-making process (what the system is "trying to do"), but
              that is unobserved and only implicitly controlled.
            </p>
            <p>
              Aside: Could more advanced approaches to transparency and
              interpretability help here? They'd certainly be useful in
              diagnosing failures, but unless we can also leverage them for
              training, we might still be stuck with architectures that are
              difficult to align.
            </p>
            <p>
              What's the alternative? It is what we could call{" "}
              <em>internal supervision</em> - supervising not just input-output
              behavior, but also cognitive processes. There is some prior work,
              with{" "}
              <a href="https://arxiv.org/abs/1511.06279?context=cs">
                Neural Programmer-Interpreters
              </a>{" "}
              perhaps being the most notable instance of that class. However,
              depending on how you look at it, there is currently much less
              interest in such approaches than in end-to-end training, which
              isn't surprising: A big part of the appeal of AI over traditional
              programming is that you don't need to specify how exactly problems
              are solved.
            </p>
          </div>
        ),

        "5": (
          <div>
            <p>
              In this talk, I'll discuss an alternative research program for AGI
              based on internal supervision. This program is based on imitating
              human reasoning and meta-reasoning, and will be much less
              developed than the one based on external supervision and training
              in complex environments.
            </p>
            <p>
              The goal for this alternative program is to codify reasoning that
              people consider "good" ("helpful", "corrigible", "conservative").
              This could include some principles of good reasoning that we know
              how to formalize (such as probability theory and expected value
              calculations), but could also include heuristics and sanity checks
              that are only locally valid.
            </p>
            <p>
              For a system built this way, it could be substantially easier to
              become confident that it is aligned. Any bad outcomes would need
              to be produced through a sequence of human-endorsed reasoning
              steps. This is far from a guarantee that the resulting behavior is
              good, but seems like a much better starting point. (See e.g.{" "}
              <a href="http://effective-altruism.com/ea/1ca/my_current_thoughts_on_miris_highly_reliable/#s6">
                Dewey 2017
              </a>
              .)
            </p>
            <p>
              The hope would be to (wherever possible) punt on solving hard
              problems such as what decision theory agents should use, and how
              to approach epistemology and value learning, and instead to build
              AI systems that inherit our epistemic situation, i.e. that are
              uncertain about those topics to the extent that we are uncertain.
            </p>
          </div>
        ),

        "6": (
          <p>
            I've described external and internal supervision as different
            approaches, but in reality there is a spectrum, and it is likely
            that practical systems will combine both.
          </p>
        ),

        "7": (
          <div>
            <p>
              However, the right end of the spectrum - and especially approaches
              based on learning to reason from humans - are more neglected right
              now. Ought aims to specifically make progress on automating
              human-like or human-endorsed deliberation.
            </p>
            <p>
              A key challenge for these approaches is scalability: Even if we
              could learn to imitate how humans solve particular cognitive
              tasks, that wouldn't be enough. In most cases where we figured out
              how to automate cognition, we didn't just match human ability, but
              exceeded it, sometimes by a large margin. Therefore, one of the
              features we'd want an approach to AI based on imitating human
              metareasoning to have is a story for how we could use that
              approach to eventually exceed human ability.
            </p>
            <p>
              Aside: Usually, I fold "aligned" into the definition of{" "}
              <Link to="/research/factored-cognition/scalability">
                "scalable"
              </Link>{" "}
              and describe <Link to="/mission">Ought's mission</Link> as
              "finding scalable ways to leverage ML for deliberation".
            </p>
          </div>
        ),

        "8": (
          <div>
            <p>
              What does it mean to "automate deliberation"? Unlike in more
              concrete settings such as playing a game of Go, this is not
              immediately clear.
            </p>
            <p>
              For Go, there's a clear task (choose moves based on a game state),
              there's relevant data (recorded human games), and there's an
              obvious objective (to win the game). For deliberation, none of
              these are obvious.
            </p>
          </div>
        ),

        "9": (
          <p>
            As a task, we'll choose question-answering. This encompasses
            basically all other tasks, at least if questions are allowed to be
            big (i.e. can point to external data).
          </p>
        ),

        "10": (
          <div>
            <p>
              The data we'll train on will be recorded human actions in{" "}
              <em>cognitive workspaces</em>. I'll show an example in a couple of
              slides. The basic idea is to make thinking explicit by requiring
              people to break it down into small reasoning steps, to limit
              contextual information, and to record what information is
              available at each step.
            </p>
            <p>
              An important point here is that our goal is not to capture human
              reasoning exactly as it is in day-to-day life, but to capture{" "}
              <em>a</em> way of reasoning that people would endorse. This is
              important, because the strategies we need to use to make thinking
              explicit will necessarily change how people think.
            </p>
          </div>
        ),

        "11": (
          <div>
            <p>
              Finally, the objective will be to choose cognitive actions that
              people would endorse after deliberation.
            </p>
            <p>
              Note the weird loop - since our task is automating deliberation,
              the objective is partially defined in terms of the behavior that
              we are aiming to improve throughout the training process. This
              suggests that we might be able to set up training dynamics where
              the supervision signal always stays a step ahead of the current
              best system, analogous to GANs and self-play.
            </p>
          </div>
        ),

        "12": (
          <div>
            <p>
              We can decompose the problem of automating deliberation into two
              parts:
            </p>
            <ol>
              <li>
                How can we make deliberation sufficiently explicit that we could
                in principle replicate it using machine learning? In other
                words, how do we generate the appropriate kind of training data?
              </li>
              <li>How do we actually automate it?</li>
            </ol>
            <p>
              In case you're familiar with{" "}
              <a href="https://ai-alignment.com/iterated-distillation-and-amplification-157debfd1616">
                Iterated Distillation and Amplification
              </a>
              : The two parts roughly correspond to amplification (first part)
              and distillation (second part).
            </p>
          </div>
        ),

        "13": (
          <div>
            <p>
              The core concept behind our approach is that of a{" "}
              <em>cognitive workspace</em>. A workspace is associated with a
              question and a human user is tasked with making progress on
              thinking through that question. To do so, they have multiple
              actions available:
            </p>
            <ul>
              <li>They can reply to the question.</li>
              <li>
                They can edit a scratchpad, writing down notes about
                intermediate results and ideas on how to make progress on this
                question.
              </li>
              <li>
                They can ask sub-questions that help them answer the overall
                question.
              </li>
            </ul>
            <p>
              Sub-questions are answered in the same way, each by a different
              user. This gives rise to a tree of questions and answers. The size
              of this tree is controlled by a budget that is associated with
              each workspace and that the corresponding user can distribute over
              sub-questions.
            </p>
            <p>
              The approach we'll take to automating cognition is based on
              recording and imitating actions in such workspaces. Apart from
              information passed in through the question and through answers to
              sub-questions, each workspace is isolated from the others. If we
              show each workspace to a different user and limit the total time
              for each workspace to be short, e.g. 15 minutes, we{" "}
              <em>factor</em> the problem-solving process in a way that
              guarantees that there is no unobserved latent state that is
              accumulated over time.
            </p>
          </div>
        ),

        "14": (
          <div>
            <p>
              There are a few more technicalities that are important to making
              this work in practice.
            </p>
            <p>
              The most important one is probably the use of <em>pointers</em>.
              If we can only ask plain-text questions and sub-questions, the
              bandwidth of the question-answering system is severely limited.
              For example, we can't ask "Why did the protagonist crash the car
              in book X" because the book X would be too large to pass in as a
              literal question. Similarly, we can't delegate "Write an inspiring
              essay about architecture", because the essay would be too large to
              pass back.
            </p>
            <p>
              {" "}
              We can lift this restriction by allowing users to create and pass
              around pointers to datastructures. A simple approach for doing
              this is to replace plain text everywhere with <em>
                messages
              </em>{" "}
              that consist of text interspersed with references to other
              messages.
            </p>
            <p>
              The combination of pointers and short per-workspace time limits
              leads to a system where many problems are best tackled in an
              algorithmic manner. For example, in many situations all a
              workspace may be doing is mapping a function (represented as a
              natural language message) over a list (a message with linked list
              structure), without the user knowing or caring about the content
              of the function and list.
            </p>
          </div>
        ),

        "15": (
          <div>
            <p>
              Now let's try to be a bit more precise about the parts of the
              system we've seen.
            </p>
            <p>
              One component is the <em>human policy</em>, which we treat as a
              stateless map from contexts (immutable versions of workspaces) to
              actions (such as asking a particular sub-question).
            </p>
            <p>
              Coming up with a single such actions should take the human at most
              a few minutes.
            </p>
          </div>
        ),

        "16": (
          <div>
            <p>
              The other main component is the transition function, which
              consumes a context and an action and generates a set of new
              contexts.
            </p>
            <p>
              For example, if the action is to ask a sub-question, there will be
              two new contexts:
            </p>
            <ol>
              <li>
                The successor of the parent context that now has an additional
                reference to a sub-question.
              </li>
              <li>
                The initial context for the newly generated sub-question
                workspace.
              </li>
            </ol>
          </div>
        ),

        "17": (
          <p>
            Composed together, the human policy and the transition function
            define a kind of evaluator: A map from a context to a set of new
            contexts.
          </p>
        ),

        "18": (
          <div>
            <p>
              In what follows, nodes (depicted as circles) refer to workspaces.
              Note that both inputs and outputs of workspaces can be messages
              with pointers, i.e. can be very large objects.
            </p>

            <p>
              I'll mostly collapse workspaces to just questions and answers, so
              that we can draw entire trees of workspaces more easily.
            </p>
          </div>
        ),

        "19": (
          <div>
            <p>
              By iteratively applying the evaluator, we generate increasingly
              large trees of workspaces. Over the course of this process, the
              root question will become increasingly informed by answers to
              sub-computations, and should thus become increasingly correct and
              helpful. (What exactly happens depends on how the transition
              function is set up, and what instructions we give to the human
              users.)
            </p>
            <p>
              This process is essentially identical to what Paul Christiano
              refers to as{" "}
              <a href="https://ai-alignment.com/policy-amplification-6a70cbee4f34">
                amplification
              </a>
              : A single amplification step augments an agent (in our case, a
              human question-answerer) by giving it access to calls to itself.
              Multiple amplification steps generate trees of agents assisting
              each other.
            </p>
          </div>
        ),

        "20": (
          <div>
            <p>
              I'll now walk through a few examples of different types of
              thinking by recursive decomposition.
            </p>
            <p>
              The longer-term goal behind these examples is to understand: How
              decomposable is cognitive work? That is, can amplification work -
              in general, or for specific problems, with or without strong
              bounds on the capability of the resulting system?
            </p>
            <p>
              Perhaps the easiest non-trivial case is arithmetic: To multiply
              two numbers, we can use the rules of addition and multiplication
              to break down the multiplication into a few multiplications of
              smaller numbers and add up the results.
            </p>
            <p>
              If we wanted to scale to very large numbers, we'd have to
              represent each number as a nested pointer structure instead of
              plain text as shown here.
            </p>
          </div>
        ),

        "21": (
          <p>
            We can also implement other kinds of algorithms. Here, we're given a
            sequence of numbers as a linked list and we sum it up one by one.
            This ends up looking pretty much the same as how you'd sum up a list
            of numbers in a purely functional programming language such as Lisp
            or Scheme.
          </p>
        ),

        "22": (
          <div>
            <p>
              Indeed, we can implement <em>any</em> algorithm using this
              framework - it is computationally universal. One way to see this
              is to implement an evaluator for a programming language, e.g.
              following the example of the meta-circular evaluator in SICP.
            </p>
            <p>
              As a consequence, if there's a problem we can't solve using this
              sort of framework, it's not because the framework can't run the
              program required to solve it. It's because the framework can't{" "}
              <em>come up with</em> the program by composing short-term tasks.
            </p>
          </div>
        ),

        "23": (
          <p>
            Let's start moving away from obviously algorithmic examples. This
            example shows how one could generate a Fermi estimate of a quantity
            by combining upper and lower bounds for the estimates of component
            quantities.
          </p>
        ),

        "24": (
          <div>
            <p>
              This example hints at how one might implement conditioning for
              probability distributions. We could first generate a list of all
              possible outcomes together with their associated probabilities,
              then filter the list of outcomes to only include those that
              satisfy our condition, and renormalize the resulting
              (sub-)distribution such that the probabilities of all outcomes sum
              to one again.
            </p>
            <p>
              The general principle here is that we're happy to run very
              expensive computations as long as they're semantically correct.
              What I've described for conditioning is more or less the textbook
              definition of exact inference, but in general that is
              computationally intractable for distributions with many variables.
              The reason we're happy with expensive computations is that
              eventually we won't instantiate them explicitly, but rather
              emulate them using cheap ML-based function approximators.
            </p>
          </div>
        ),

        "25": (
          <div>
            <p>
              If we want to use this framework to implement agents that can
              eventually exceed human capability, we can't use most human
              object-level knowledge, but rather need to set up a process that
              can learn human-like abilities from data in a more scalable way.
            </p>
            <p>
              Consider the example of understanding natural language: If we
              wanted to determine whether a pair of sentences is a
              contradiction, entailment, or neutral (as in the SNLI dataset), we
              could simply ask the human to judge - but this won't scale to
              languages that none of the human judges know.
            </p>
            <p>
              Alternatively, we can break down natural language understanding
              into (very) many small component tasks and try to solve the task
              without leveraging the humans' native language understanding
              facilities much. For example, we might start by computing the
              meaning of a sentence as a function of the meanings of all
              possible pairs of sub-phrases.
            </p>
            <p>
              As in the case of probabilistic inference, this will be
              computationally intractable, and getting the decomposition right
              in the first place is substantially harder than solving the
              object-level task.
            </p>
          </div>
        ),

        "26": (
          <div>
            <p>
              Here's a class of problems that seems particularly challenging for
              factored cognition: Problems where people would usually learn
              concepts over an extended period of time.
            </p>
            <p>
              Consider solving a problem that is posed halfway through a math
              textbook. Usually, the textbook reader would have solved many
              simpler problems up to this point and would have built up
              conceptual structures and heuristics that then allow them to solve
              this new problem. If we need to solve the problem by composing
              work done by a large collection of humans, none of which can spend
              more than 15 minutes on the task, we'll have to replace this
              intuitive, implicit process with an externalized, explicit
              alternative.
            </p>
            <p>
              It's not entirely clear to me how to do that, but one way to start
              would be to build up knowledge about the propositions and entities
              that are part of the problem statement by effectively applying
              semantic parsing to the relevant parts of the textbook, so that we
              can later ask whether (e.g.) a proposition with meaning X implies
              a proposition with meaning Y, where both X and Y are large nested
              pointer structures that encode detailed meaning representations.
            </p>
            <p>
              If this reminds you of Good Old-Fashioned AI, it is not by
              accident. We're essentially trying to succeed where GOFAI failed,
              and our primary advantage is that we're okay with exponentially
              expensive computations, because we're not planning to ever run
              them directly. More on that soon.
            </p>
          </div>
        ),

        "27": (
          <div>
            <p>
              So far, the workspaces we've looked at were quite sparse. All
              questions and answers were limited to a sentence or two. This
              "low-bandwidth" setting is not the only way to use the system - we
              could alternatively instruct the human users to provide more
              detail in their questions and to write longer answers.
            </p>
            <p>
              For the purpose of automation, low bandwidth has advantages, both
              in the short term (where it makes automation easier) and in the
              long term (where it reduces a particular class of{" "}
              <a href="https://ai-alignment.com/security-amplification-f4931419f903">
                potential security vulnerabilities
              </a>
              ).
            </p>
            <p>
              Empirical evidence from experiments with humans will need to
              inform this choice as well, and the correct answer is probably at
              least slightly more high-bandwidth than the examples shown so far.
            </p>
          </div>
        ),

        "28": (
          <div>
            <p>
              Here's a kind of reasoning that I feel relatively optimistic that
              we can implement using factored cognition: Causal reasoning, both
              learning causal structures from data as well as computing the
              results of interventions and counterfactuals.
            </p>
            <p>
              The particular tree of workspaces shown here doesn't really
              illustrate this, but I can imagine implementing Pearl-style
              algorithms for causal inference in a way where each step locally
              makes sense and slightly simplifies the overall problem.
            </p>
          </div>
        ),

        "29": (
          <div>
            <p>
              The final example, meta-reasoning, is in some ways the most
              important one: If we want factored cognition to eventually produce{" "}
              <em>very</em> good solutions to problems - perhaps being
              competitive with any other systematic approach - then it's not
              enough to rely on the users directly choosing a good object-level
              decomposition for the problem at hand. Instead, they'll need to go
              meta and use the system to reason about what decompositions would
              work well, and how to find them.
            </p>
            <p>
              One kind of general pattern for this is that users can ask
              something like "What approach should we take to problem{" "}
              <code>#1</code>?" as a first sub-problem, get back an answer{" "}
              <code>#2</code>, and then ask "What is the result of executing
              approach <code>#2</code> to question <code>#1</code>?" as a second
              sub-question. As we increase the budget for the meta-question, the
              object-level approach can change radically.
            </p>
            <p>
              And, of course, we could also go meta twice, ask about approaches
              to solving the first meta-level problem, and the same
              consideration applies: Our meta-level approach to finding good
              object-level approaches could improve substantially as we invest
              more budget in meta-meta.
            </p>
          </div>
        ),

        "30": (
          <div>
            <p>
              So far, I've shown one particular instantiation of factored
              cognition: a way to structure workspaces, a certain set of
              actions, and a corresponding implementation of the transition
              function that generates new workspace versions.
            </p>
            <p>
              By varying each of these components, we can generate other ways to
              build systems in this space. For example, we might include actions
              for asking clarifying questions. I've written about these degrees
              of freedom on{" "}
              <Link to="/research/factored-cognition/taxonomy">
                our taxonomy page
              </Link>
              .
            </p>
          </div>
        ),

        "31": (
          <div>
            <p>
              Here's one example of an alternate system. This is a
              straightforward Javascript port of parts of Paul Christiano's{" "}
              <a href="https://github.com/paulfchristiano/alba">
                ALBA implementation
              </a>
              .
            </p>
            <p>
              Workspaces are structured as sequences of observations and
              actions. All actions are commands that the user types, including
              <code>ask</code>, <code>reply</code>, <code>view</code> (for
              expanding a pointer), and <code>reflect</code>
              (for getting a pointer to the current context).
            </p>
            <p>
              The command-line version is{" "}
              <a href="https://github.com/oughtinc/hch">available on Github</a>.
            </p>
          </div>
        ),

        "32": (
          <div>
            <p>
              A few days ago, we open-sourced{" "}
              <a href="https://github.com/oughtinc/patchwork">Patchwork</a>, a
              new command-line app for recursive question-answering where we
              paid particular attention to build it in a way that is a good
              basis for multiple users and automation. To see a brief
              screencast, take a look at{" "}
              <a href="https://github.com/oughtinc/patchwork#patchwork">
                the README
              </a>
              .
            </p>
          </div>
        ),

        "33": (
          <p>
            Suppose decomposition worked and we could solve difficult problems
            using factored cognition - how could we transition from only using
            human labor to partial automation and eventually full automation?
            I'll discuss a few approaches, starting from very basic ideas that
            we can implement now and progressing to ones that will not be
            tractable using present-day ML.
          </p>
        ),

        "34": (
          <div>
            <p>
              Let's again consider a tree of workspaces, and in each workspace,
              one or more humans taking one or more actions.
            </p>
            <p>
              For simplicity, I'll pretend that there is just a single action
              per workspace. This allows me to equivocate nodes and actions
              below. Nothing substantial changes if there are multiple actions.
            </p>
            <p>
              I'll also pretend that all humans are essentially identical, which
              is obviously false, but allows me to consider the simpler problem
              of learning a single human policy from data.
            </p>
          </div>
        ),

        "35": (
          <div>
            <p>
              As a first step towards automation, we'll <em>memoize</em> the
              human H. That is, whenever we would show a context to H, we first
              check whether we've shown this context to some other H before, and
              if so, we directly reuse the action that was taken previously.
            </p>
            <p>
              This is a big win if many contexts are simple. For example, it may
              be very common to want to map a function over a list, and this
              operation will always involve the same kinds of sub-questions
              (check if the list is empty, if not get the first element, apply
              the function to the first element, etc). Ideally, we only do this
              sort of work once and then reuse it in the future. Memoization
              gets us part of the way there.
            </p>
          </div>
        ),

        "36": (
          <div>
            <p>
              A significant step up in difficulty, we can try to imitate the
              behavior of H even in cases where the new context doesn't match
              any element of our "training set" exactly.
            </p>
            <p>
              Of course, for general question-answering, we won't be able to
              fully automate the human policy any time soon. This means that any
              imitation algorithms we do apply will need to make choices about
              whether a context under consideration is the sort of situation
              where they can make good predictions about what a human would do,
              or whether to abstain.
            </p>
            <p>
              If algorithms can make multiple choices in sequence, we need
              algorithms that are well-calibrated about when their actions are
              appropriate, and that in particular have very few false positives.
              Otherwise, even a relatively low probability of false positives
              could cascade into sequences of inappropriate actions.
            </p>
          </div>
        ),

        "37": (
          <p>
            We've tried to isolate and study this particular problem - making
            well-calibrated predictions in AI-complete domains - in a separate
            project called{" "}
            <Link to="/research/judgments">Predicting Slow Judgments</Link>. So
            far, we've found it challenging to make non-trivial predictions
            about human responses for the dataset we've collected there.
          </p>
        ),

        "38": (
          <p>
            How useful would it be to be able to automate some fraction of human
            actions? If the total number of actions needed to solve a task is
            exponentially large (e.g. because we're enumerating all potential
            sub-phrases of a paragraph of text), even being able to automate 90%
            of all actions wouldn't be enough to make this approach
            computationally tractable. To get to tractability in that regime, we
            need to automate entire subtrees. (And we need to do so using an
            amount of training data that is not itself exponentially large - an
            important aspect that this talk won't address at all.)
          </p>
        ),

        "39": (
          <div>
            <p>
              Let's reconsider amplification. Recall that in this context, each
              node represents the question-answer behavior implemented by a
              workspace operated on by some agent (to start with, a human). This
              agent can pose sub-questions to other agents who may or may not
              themselves get to ask such sub-questions, as indicated by whether
              they have nodes below them or not.
            </p>
            <p>
              Each step grows the tree of agents by one level, so after{" "}
              <em>n</em> steps, we have a tree of size{" "}
              <em>
                O(2<sup>n</sup>)
              </em>
              . This process will become intractable before long.
            </p>
            <p>
              (The next few slides describe Paul Christiano's{" "}
              <a href="https://ai-alignment.com/iterated-distillation-and-amplification-157debfd1616">
                Iterated Distillation and Amplification
              </a>{" "}
              approach to training ML systems.)
            </p>
          </div>
        ),

        "40": (
          <p>
            Instead of iterating amplification, let's pause after one step. We
            started out with a single agent (a human) and then built a composite
            system using multiple agents (also all humans). This composite
            system is slower than the one we started out with. This slowdown
            perhaps isn't too bad for a single step, but it will add up over the
            course of multiple steps. To iterate amplification many times, we
            need to avoid this slowdown. What can we do?
          </p>
        ),

        "41": (
          <p>
            The basic idea is to train an ML-based agent to imitate the behavior
            of the composite system. A simple (but insufficient!) approach would
            be to generate training data - questions and answers - based on the
            behavior of the composite system, and to train a supervised learner
            using this dataset.
          </p>
        ),

        "42": (
          <div>
            <p>
              In practice, this sort of training ("distillation") would probably
              need to involve not just imitation, but more advanced techniques,
              including adversarial training and approaches to interpretability
              that allow the composite system (the "overseer") to reason about
              the internals of its fast ML-based successor.
            </p>
            <p>
              If we wanted to implement this training step in rich domains, we'd
              need ML techniques that are substantially better than the state of
              the art as of May 2018, and even then, some domains would almost
              certainly resist efficient distillation.
            </p>
          </div>
        ),

        "43": (
          <p>
            But, hypothetically, if we could implement faithful distillation, we
            would have a much better starting point for the next amplification
            step: We could compose together multiple instances of the fast
            ML-based learner, and the result would be a tree of agents that is
            only as large as the one we built in the first step (3 nodes, say),
            but exhibits the question-answer behavior of an agent that has
            multiple advisors, <em>each of which</em> as capable as the entire
            tree at the first step.
          </p>
        ),

        "45": (
          <p>
            We can repeat whatever training process we used in the first step to
            get a yet better distilled system that "imitates" the behavior of
            the overseer composed of the systems trained in the previous step.
          </p>
        ),

        "48": (
          <p>
            Through repeated amplification and distillation, we could hope to
            eventually satiate the representational and computational abilities
            of whatever ML system we're using in the distillation step, while
            guiding it towards implementing question-answer behavior that
            corresponds to what H would do if they had a large number of
            well-resourced assistants.
          </p>
        ),

        "49": (
          <p>
            In practice, we might not want to implement this process as a series
            of distinct systems, and instead run self-play where a single system
            serves both as the overseer and the system-to-be-trained.
          </p>
        ),

        "50": (
          <div>
            <p>
              If Iterated Amplification and Distillation can work, we might be
              able to approximate the results of running some computations that
              would naively take exponential time: after <em>n</em> steps of
              amplification and distillation, we'd use a fast ML-based
              approximator to run computations that would take{" "}
              <em>
                O(2<sup>n</sup>)
              </em>{" "}
              time if we instantiated them explicitly.
            </p>
            <p>
              As a particularly interesting special case, this might include the
              kinds of human-guided computations that arise from people take
              actions in cognitive workspaces.
            </p>
          </div>
        ),

        "51": (
          <div>
            <p>
              There are many open questions for the scheme described above, both
              on whether we can make reasoning explicit, and on whether we can
              automate it efficiently even if it is made explicit. While I've
              talked a bit about automation, anything beyond basic automation is
              out of scope for Ought right now, so I'll focus on open questions
              related to decomposition.
            </p>
            <p>
              For decomposition, the two main questions we ask ourselves are:
            </p>
            <ol>
              <li>
                Can factored cognition recover the ability of a single human
                working over time for essentially all important tasks?
              </li>
              <li>
                If so, can we exceed the capability of other systematic
                approaches to problem-solving if we just use sufficiently large
                budgets, i.e. compose sufficiently many small workspaces in
                sufficiently large trees? Equivalently, can we reach essentially
                arbitrarily high capability if we execute sufficiently many
                amplifications steps?
              </li>
            </ol>
            <p>
              Our plan is to study both of these questions using a set of
              challenge problems.
            </p>
          </div>
        ),

        "52": (
          <div>
            <p>
              The idea behind these challenge problems is to pick problems that
              are particularly likely to stretch the capabilities of problem
              solving by decomposition:
            </p>
            <ol>
              <li>
                When people tackle tricky <em>math or programming puzzles</em>,
                they sometimes give up, go to bed, and the next day in the
                shower they suddenly know how to solve it. Can we solve such
                puzzles even if no single individual spends more than 15 minutes
                on the problem?
              </li>
              <li>
                We've already seen a math textbook example earlier. We want to
                know more generally whether we can replicate the effects of
                learning over time, and are planning to study this using
                different kinds of <em>textbook problems</em>.
              </li>
              <li>
                Similarly, when people reason about evidence, e.g. about whether
                a statement that a politician made is true, they seem to make
                incremental updates to opaque internal models and may use
                heuristics that they find difficult to verbalize. If we instead
                require all evidence to be aggregated explicitly, can we still
                match or exceed their <em>fact-checking</em> capabilities?
              </li>
              <li>
                All examples of problems we've seen are one-off problems.
                However, ultimately we want to use automated systems to interact
                with a stateful world, e.g. through <em>dialog</em>. Abstractly,
                we know{" "}
                <Link to="/research/factored-cognition/taxonomy#interaction">
                  how to approach this situation
                </Link>
                , but we'd like to try it in practice e.g. on personal questions
                such as "Where should I go on vacation?".
              </li>
              <li>
                For systems to scale to high capability, we've noted earlier
                that they will need to reason about cognitive strategies, not
                just object-level facts. <em>Prioritizing tasks</em> for a user
                might be a domain particularly suitable for testing this, since
                the same kind of reasoning (what to work on next) could be used
                on both object- and meta-level.
              </li>
            </ol>
          </div>
        ),

        "53": (
          <div>
            <p>
              If we make progress on the feasibility of factored cognition and
              come to believe that it might be able to match and eventually
              exceed "normal" thinking, we'd like to move towards learning more
              about how this process would play out.
            </p>
            <p>
              What would the human policy - the map from contexts to actions -
              look like that would have these properties? What concepts would be
              part of this policy? For scaling to high capability, it probably
              can't leverage most of the object-level knowledge people have. But
              what else? Abstract knowledge about how to reason? About
              causality, evidence, agents, logic? And how big would this policy
              be - could we effectively treat it as a lookup table, or are there
              many questions and answers in distinct domains that we could only
              really learn to imitate using sophisticated ML?
            </p>
            <p>
              What would happen if we scaled up by iterating this learned human
              policy many times? What instructions would the humans that
              generate our training data need to follow for the resulting system
              to remain corrigible, even if run with extremely large amounts of
              computation (as might be the case if distillation works)? Would
              the behavior of the resulting system be chaotic, strongly
              dependent on its initial conditions, or could we be confident that
              there is a{" "}
              <a href="https://ai-alignment.com/corrigibility-3039e668638">
                basin of attraction
              </a>{" "}
              that all careful ways of setting up such a system converge to?
            </p>
          </div>
        )
      }}
    >
      <p>
        The presentation below motivates our{" "}
        <Link to="/research/factored-cognition">Factored Cognition</Link>{" "}
        project from an AI alignment angle and describes the state of our work
        as of May 2018. Andreas gave versions of this presentation at CHAI
        (4/25), a Deepmind-FHI seminar (5/24) and FHI (5/25).
      </p>
    </Presentation>
  </Page>
);

export default FactoredCognitionMay2018;
