Skip to contents

Dataset used to predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset Train dataset contains 13 features and 30178 observations. Test dataset contains 13 features and 15315 observations. Target column is "target": A binary factor where 1: <=50K and 2: >50K for annual income. The column "sex" is set as protected attribute.

Source

Dua, Dheeru, Graff, Casey (2017). “UCI Machine Learning Repository.” http://archive.ics.uci.edu/ml/.

Pre-processing

  • fnlwgt Remove final weight, which is the number of people the census believes the entry represents

  • native-country Remove Native Country, which is the country of origin for an individual

  • Rows containing NA in workclass and occupation have been removed.

  • Pre-processing inspired by article: @url https://cseweb.ucsd.edu//classes/sp15/cse190-c/reports/sp15/048.pdf

Metadata

  • (integer) age: The age of the individuals

  • (factor) workclass: A general term to represent the employment status of an individual

  • (factor) education: The highest level of education achieved by an individual.

  • (integer) education_num: the highest level of education achieved in numerical form.

  • (factor) marital_status: marital status of an individual.

  • (factor) occupation: the general type of occupation of an individual

  • (factor) relationship: twhether the individual is in a relationship-

  • (factor) race: Descriptions of an individual’s race

  • (factor) sex: the biological sex of the individual

  • (integer) captain-gain: capital gains for an individual

  • (integer) captain-loss: capital loss for an individual

  • (integer) hours-per-week: the hours an individual has reported to work per week

  • (factor) target: whether or not an individual makes more than $50,000 annually

Examples

data("adult_test", package = "mlr3fairness")
data("adult_train", package = "mlr3fairness")