// @flow

import * as React from "react";
import { Link } from "react-router-dom";

import Section from "../../../components/Section/Section";
import Page from "../../../components/Page/Page";
import GridLayout from "../../../components/GridLayout/GridLayout";

const Tasks = () => (
  <Page
    className="Tasks"
    title="A set of tasks for evaluating scalable problem solving"
    next={{
      url: "/research/factored-cognition/networks-of-workspaces",
      title: "Networks of Workspaces",
      description: "A user-friendly approach to capability amplification",
    }}
  >
    <GridLayout>
      <Section title="Summary">
        <div>
          <p>
            How can we decide which{" "}
            <Link to="/research/factored-cognition/taxonomy">mechanisms</Link>{" "}
            for{" "}
            <Link to="/research/factored-cognition/">factored cognition</Link>{" "}
            to implement, and how can we evaluate them once implemented? Given
            that our goal is to find mechanisms that are plausibly{" "}
            <Link to="/research/factored-cognition/scalability">scalable</Link>,
            we now outline a set of tasks that would provide evidence to that
            extent if a system solves them using only short-term human work.
          </p>
        </div>
      </Section>
      <Section title="Aims for tasks">
        <p>These tasks should have the following properties:</p>

        <ol>
          <li>
            Each task exercises one or more essential capabilities of the
            system. These include:
            <ol>
              <li>
                Reasoning with external knowledge that is too large for
                individual instances of H to handle
              </li>
              <li>
                Reasoning incrementally and explicitly in cases that humans
                would approach holistically and implicitly by default
              </li>
              <li>
                Working with concepts that no individual H can learn given the
                available time
              </li>
              <li>
                Interaction with the external world, both when it is treated as
                stateful (e.g., generating dialog) and when not (e.g., running
                computations)
              </li>
              <li>
                Reasoning not just about object-level ideas, but also about what
                cognitive strategies to employ
              </li>
            </ol>
          </li>
          <li>
            If the system succeeds at all tasks, it is likely that is has
            sufficient capability to solve much more complex tasks given a
            sufficiently large budget for calls to H. I don’t think we can hope
            to be confident in universality, even if the system solves all tasks
            we come up with. New failure modes—such as security failures of
            H—may show up with very large budgets.
          </li>
          <li>
            The tasks are not unnecessarily complex. That is, if there is a task
            that tests the same properties while requiring fewer calls to H, we
            would rather pick that task.
          </li>
          <li>
            The tasks are sufficiently rich that we can observe interesting
            changes in the quality of solutions as we increase the budget the
            system has to work with.
          </li>
        </ol>
      </Section>
      <Section title="Proposed tasks">
        <p>
          Here are some tasks I’m currently considering. Each task exercises all
          of the capabilities 1a-1e to some extent. To illustrate some of the
          motivation, I will still highlight a single capability that is
          especially exercised by that task:
        </p>
        <ol>
          <li>
            <strong>
              Answering questions about books: “Why didn’t Harry kiss Sally?”
            </strong>
            <br />
            Reasoning with large external knowledge bases
          </li>
          <li>
            <strong>Fact checking: “Is it true that [proposition]?”</strong>
            <br />
            Reasoning incrementally and explicitly
          </li>
          <li>
            <strong>
              Early math textbook exercises: “Show that there are no wffs of
              length 6.”
            </strong>
            <br />
            Working with concepts that no individual H can learn
          </li>
          <li>
            <strong>
              Cost-benefit analysis: “Which airbnb should I stay at?”
            </strong>
            <br />
            Interaction with a stateful world (the asker)
          </li>
          <li>
            <strong>Prioritizing todos: “What should I work on next?”</strong>
            <br />
            Reasoning about cognitive strategies
          </li>
        </ol>
        <p>
          The last two tasks require an approach to{" "}
          <Link to="/research/factored-cognition/taxonomy#interaction">
            interaction
          </Link>
          , so that our system can ask a limited number of follow-up questions
          to elicit personal information that is required to produce a good
          solution.
        </p>
      </Section>
    </GridLayout>
  </Page>
);

export default Tasks;
