Skip to contents

get_bio() uses standard ChatGPT chat completions to retrieve structured data from input text and allows for fully customizable prompts.

get_bio_function_call() uses ChatGPT function calling to retrieve structured data from input text.

Usage

get_bio(
  bio,
  bio_name = NULL,
  prompt = NULL,
  prompt_fields = NULL,
  prompt_fields_formats = NULL,
  prompt_fields_values = NULL,
  prompt_fewshot = NULL,
  openai_api_key = NULL,
  openai_model = "gpt-3.5-turbo",
  openai_temperature = 0,
  openai_seed = NULL
)

get_bio_function_call(
  bio,
  bio_name = NULL,
  prompt_fields = NULL,
  prompt_fields_formats = NULL,
  prompt_fields_values = NULL,
  prompt_fields_descriptions = NULL,
  prompt_fewshot = NULL,
  openai_api_key = NULL,
  openai_model = "gpt-3.5-turbo",
  openai_temperature = 0,
  openai_seed = NULL
)

Arguments

bio

The bio to be processed, a string

bio_name

The name of the individual whose biographical information is desired, a string. For get_bio(), bio_name can be a vector of strings containing the names of all individuals for whom biographical information is desired

prompt

Only for use in get_bio(). A string. If desired, a custom prompt. This overrides the default prompt and should include any desired prompt fields, formats, and values.

prompt_fields

A character vector of desired biographical output fields (e.g., "college", "graduate_school")

prompt_fields_formats

A named list of strings giving desired formats for output fields (e.g., "{SCHOOL} - {DEGREE}"). Names should be present in prompt_fields.

prompt_fields_values

A named list of character vectors of desired output values for each prompt field. Names should be present in prompt_fields.

prompt_fewshot

A data.frame or tibble with complete example data. Should have a column called 'bio' containing unstructured example text, a column called 'bio_name' containing the name of the individual in the example (if applicable), and columns with outputs for every field in prompt_fields

  • get_bio() Example: data.frame(bio = "John Smith went to Nowhere University, and he graduated with a B.A.", bio_name = "John Smith", gender = "Male", college = "Nowhere University - B.A.")

openai_api_key

API key for OpenAI, a string. If this is NULL, get_bio() searches .Renviron for API key.

openai_model

ChatGPT model to use, defaults to "chatgpt-3.5-turbo"

openai_temperature

A number between 0 and 2, specifies the amount of randomness in ChatGPT, with more randomness for higher numbers, defaults to 0

openai_seed

An integer, pecifies a random seed for ChatGPT (this is in the development stage at OpenAI, so it might not work perfectly).

prompt_fields_descriptions

Only for use in get_bio_function_call(). A named list of strings with additional text describing each prompt field. Names should be present in prompt_fields.

Value

A tibble containing desired biographical information or unprocessed API output from custom prompt

Examples

# Biographical Information about Kevin McCarthy from
# https://bioguide.congress.gov/search/bio/M001165
get_bio(bio = "MCCARTHY, KEVIN, a Representative from California;
              born in Bakersfield, Kern County, Calif., January 26,
              1965; graduated from Bakersfield High School,
              Bakersfield, Calif., 1983; attended Bakersfield College,
              Bakersfield. Calif., 1983-1986; B.S., California State
              University, Bakersfield, Calif., 1989; M.B.A., California
             State University, Bakersfield, Calif., 1994; staff,
             United States Representative William Thomas of California,
             1987-2002; member of the California state assembly,
             2002-2007, minority leader, 2004-2006; elected as a
             Republican to the One Hundred Tenth and to the eight
             succeeding Congresses (January 3, 2007-present); majority
             whip (One Hundred Twelfth and One Hundred Thirteenth
             Congresses); majority leader (One Hundred Thirteenth
             through One Hundred Fifteenth Congresses); minority
             leader (One Hundred Sixteenth and One Hundred Seventeenth
             Congress); Speaker of the House (One Hundred Eighteenth
             Congress).",
      bio_name = "Kevin McCarthy")
#> No prompt_fields argument provided. Defaulting to: birth_date, highest_level_of_education, college, graduate school, previous_occupation, gender, town_of_birth, state_of_birth, married.
#> Input Tokens: 429
#> Output Tokens: 138
#> Total Tokens: 567
#> # A tibble: 1 × 9
#>   birth_date highest_level_of_educ…¹ college graduate_school previous_occupation
#>   <chr>      <chr>                   <chr>   <chr>           <chr>              
#> 1 01/26/1965 M.B.A.                  Bakers… California Sta… staff, United Stat…
#> # ℹ abbreviated name: ¹​highest_level_of_education
#> # ℹ 4 more variables: gender <chr>, town_of_birth <chr>, state_of_birth <chr>,
#> #   married <chr>
get_bio_function_call(bio = "MCCARTHY, KEVIN, a Representative from California;
                             born in Bakersfield, Kern County, Calif., January 26,
                             1965; graduated from Bakersfield High School,
                             Bakersfield, Calif., 1983; attended Bakersfield College,
                             Bakersfield. Calif., 1983-1986; B.S., California State
                             University, Bakersfield, Calif., 1989; M.B.A., California
                             State University, Bakersfield, Calif., 1994; staff,
                             United States Representative William Thomas of California,
                             1987-2002; member of the California state assembly,
                             2002-2007, minority leader, 2004-2006; elected as a
                             Republican to the One Hundred Tenth and to the eight
                             succeeding Congresses (January 3, 2007-present); majority
                             whip (One Hundred Twelfth and One Hundred Thirteenth
                             Congresses); majority leader (One Hundred Thirteenth
                             through One Hundred Fifteenth Congresses); minority
                             leader (One Hundred Sixteenth and One Hundred Seventeenth
                             Congress); Speaker of the House (One Hundred Eighteenth
                             Congress).",
                       bio_name = "Kevin McCarthy",
                       prompt_fields = c("highest_level_of_education",
                                         "previous_occupation", "birth_date"),
                       prompt_fields_formats = list(highest_level_of_education = "{DEGREE}",
                       previous_occupation = "{OCCUPATION} - {YEARS}",
                       birth_date = "{MM}/{DD}/{YYYY}"))
#> Input Tokens: 494
#> Output Tokens: 86
#> Total Tokens: 580
#> # A tibble: 1 × 3
#>   highest_level_of_education previous_occupation                      birth_date
#>   <chr>                      <chr>                                    <chr>     
#> 1 M.B.A.                     staff, United States Representative Wil… 01/26/1965