Describe structure of Data Sets and Importers

The function codeplan() creates a data frame that describes the structure of an item list (a data.set object or an importer object), so that this structure can be stored and and recovered. The resulting data frame has a particular print method that delimits the output to one line per variable.

With setCodeplan an item list structure (as returned by codeplan()) can be applied to a data frame or data set. It is also possible to use an assignment like codeplan(x) <- value to a similar effect.

Usage

codeplan(x)
# S4 method for item.list
codeplan(x)
# S4 method for item
codeplan(x)
setCodeplan(x,value)
# S4 method for data.frame,codeplan
setCodeplan(x,value)
# S4 method for data.frame,NULL
setCodeplan(x,value)
# S4 method for data.set,codeplan
setCodeplan(x,value)
# S4 method for data.set,NULL
setCodeplan(x,value)
# S4 method for item,codeplan
setCodeplan(x,value)
# S4 method for item,NULL
setCodeplan(x,value)
# S4 method for atomic,codeplan
setCodeplan(x,value)
# S4 method for atomic,NULL
setCodeplan(x,value)
codeplan(x) <- value
read_codeplan(filename,type)
write_codeplan(x,filename,type,pretty)

Arguments

x: for codeplan(x) an object that inherits from class "item.list", i.e. can be a "data.set" object or an "importer" object, it can also be an object that inherits from class "item". For write_codeplan an object from class "codeplan".
value: an object as it would be returned by codeplan(x) or NULL.
filename: a character string, the name of the file that is to be read or to be written.
type: a character string (either "yaml" or "json") oder NULL (the default), gives the type of the file into which the codeplan is written or from which it is read. If type is NULL then the file type is inferred from the file name ending (".yaml" or ",yml" for "yaml", ".json" for "json").
pretty: a logical value, whether the JSON output created by write_codeplan(...) should be prettified.

Value

If applicable, codeplan returns a list with additional S3 class attribute "codeplan". For arguments for which the relevant information does not exist, the function returns NULL.

The list has at least one element or several elements, named after the variable in the "item.list" or "data.set" x. Each list element is a list itself with the following elements:

annotation: a named character vector,
labels: a named list of labels and labelled values
value.filter: a list with at least two elements named "class" and "filter", and optionally another element named "range". The "class" element determines the class of the value filter and equals either "missing.values", "valid.values", or "valid.range". An element named "range" may only be needed if "class" is "missing.values", as it is possible (like in SPSS) to have both individual missing values and a range of missing values.
mode: a character string that describes storage mode, such as "character", "integer", or "numeric".
measurement: a character string with the measurement level, "nominal", "ordinal", "interval", or "ratio".

If codeplan(x)<-value or setCodeplan(x,value) is used and value is NULL, all the special information about annotation, labels, value filters, etc. is removed from the resulting object, which then is usually a mere atomic vector or data frame.

Examples

Data1 <- data.set(
          vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
          region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
          income = exp(rnorm(300,sd=.7))*2000
          )

Data1 <- within(Data1,{
  description(vote) <- "Vote intention"
  description(region) <- "Region of residence"
  description(income) <- "Household income"
  foreach(x=c(vote,region),{
    measurement(x) <- "nominal"
    })
  measurement(income) <- "ratio"
  labels(vote) <- c(
                    Conservatives         =  1,
                    Labour                =  2,
                    "Liberal Democrats"   =  3,
                    "Don't know"          =  8,
                    "Answer refused"      =  9,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  labels(region) <- c(
                    England               =  1,
                    Scotland              =  2,
                    Wales                 =  3,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  foreach(x=c(vote,region,income),{
    annotation(x)["Remark"] <- "This is not a real survey item, of course ..."
    })
  missing.values(vote) <- c(8,9,97,99)
  missing.values(region) <- c(97,99)
})
cpData1 <- codeplan(Data1)

Data2 <- data.frame(
          vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
          region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
          income = exp(rnorm(300,sd=.7))*2000
          )
codeplan(Data2) <- cpData1
codeplan(Data2)
#> 
#> vote:
#>   annotation:
#>     description: Vote intention
#>     Remark: This is not a real survey item, of course ...
#>   labels:
#>     Conservatives: 1.0
#>     Labour: 2.0
#>     Liberal Democrats: 3.0
#>     Don't know: 8.0
#>     Answer refused: 9.0
#>     Not applicable: 97.0
#>     Not asked in survey: 99.0
#>   value.filter:
#>     class: missing.values
#>     values:
#>     - 8.0
#>     - 9.0
#>     - 97.0
#>     - 99.0
#>   mode: numeric
#>   measurement: nominal
#> region:
#>   annotation:
#>     description: Region of residence
#>     Remark: This is not a real survey item, of course ...
#>   labels:
#>     England: 1.0
#>     Scotland: 2.0
#>     Wales: 3.0
#>     Not applicable: 97.0
#>     Not asked in survey: 99.0
#>   value.filter:
#>     class: missing.values
#>     values:
#>     - 97.0
#>     - 99.0
#>   mode: numeric
#>   measurement: nominal
#> income:
#>   annotation:
#>     description: Household income
#>     Remark: This is not a real survey item, of course ...
#>   mode: numeric
#>   measurement: ratio
#> 
codebook(Data2)
#> ================================================================================
#> 
#>    vote 'Vote intention'
#> 
#> --------------------------------------------------------------------------------
#> 
#>    Storage mode: double
#>    Measurement: nominal
#>    Missing values: 8, 9, 97, 99
#> 
#>    Values and labels              N Valid Total
#>                                                
#>     1   'Conservatives'          52  37.1  17.3
#>     2   'Labour'                 52  37.1  17.3
#>     3   'Liberal Democrats'      36  25.7  12.0
#>     8 M 'Don't know'             46        15.3
#>     9 M 'Answer refused'         40        13.3
#>    97 M 'Not applicable'         32        10.7
#>    99 M 'Not asked in survey'    42        14.0
#> 
#>    Remark:
#>        This is not a real survey item, of course ...
#> 
#> ================================================================================
#> 
#>    region 'Region of residence'
#> 
#> --------------------------------------------------------------------------------
#> 
#>    Storage mode: double
#>    Measurement: nominal
#>    Missing values: 97, 99
#> 
#>    Values and labels              N Valid Total
#>                                                
#>     1   'England'               129  51.2  43.0
#>     2   'Scotland'               82  32.5  27.3
#>     3   'Wales'                  41  16.3  13.7
#>    99 M 'Not asked in survey'    48        16.0
#> 
#>    Remark:
#>        This is not a real survey item, of course ...
#> 
#> ================================================================================
#> 
#>    income 'Household income'
#> 
#> --------------------------------------------------------------------------------
#> 
#>    Storage mode: double
#>    Measurement: ratio
#> 
#>         Min:   320.401
#>         Max: 21218.421
#>        Mean:  2517.911
#>    Std.Dev.:  2113.267
#> 
#>    Remark:
#>        This is not a real survey item, of course ...
#> 

# Note the difference between 'as.data.frame' and setting
# the codeplan to NULL:
Data2df <- as.data.frame(Data2)
codeplan(Data2) <- NULL
str(Data2)
#> 'data.frame':	300 obs. of  3 variables:
#>  $ vote  : num  3 1 2 1 9 8 1 2 2 97 ...
#>  $ region: num  2 3 1 99 1 3 3 3 1 1 ...
#>  $ income: num  5737 738 5895 3209 2247 ...
str(Data2df)
#> 'data.frame':	300 obs. of  3 variables:
#>  $ vote  : Factor w/ 3 levels "Conservatives",..: 3 1 2 1 NA NA 1 2 2 NA ...
#>   ..- attr(*, "label")= chr "Vote intention"
#>  $ region: Factor w/ 3 levels "England","Scotland",..: 2 3 1 NA 1 3 3 3 1 1 ...
#>   ..- attr(*, "label")= chr "Region of residence"
#>  $ income: num  5737 738 5895 3209 2247 ...
#>   ..- attr(*, "label")= chr "Household income"
codeplan(Data2) <- NULL # Does not change anything

# Codeplans of survey items can also be inquired and manipulated:
vote <- Data1$vote
str(vote)
#>  Nmnl. item w/ 7 labels for 1,2,3,... + ms.v.  num [1:300] 8 3 2 9 1 99 2 3 97 1 ...
cp.vote <- codeplan(vote)
codeplan(vote) <- NULL
str(vote)
#>  num [1:300] 8 3 2 9 1 99 2 3 97 1 ...
codeplan(vote) <- cp.vote
vote
#> 
#> Item 'Vote intention' (measurement: nominal, type: double, length = 300) 
#> 
#>  [1:300] *Don't know Liberal Democrats Labour *Answer refused Conservatives ...

fn.json <- paste0(tempfile(),".json")
write_codeplan(codeplan(Data1),filename=fn.json)
codeplan(Data2) <- read_codeplan(fn.json)
codeplan(Data2)
#> 
#> vote:
#>   annotation:
#>     description: Vote intention
#>     Remark: This is not a real survey item, of course ...
#>   labels:
#>     Conservatives: 1
#>     Labour: 2
#>     Liberal Democrats: 3
#>     Don't know: 8
#>     Answer refused: 9
#>     Not applicable: 97
#>     Not asked in survey: 99
#>   value.filter:
#>     class: missing.values
#>     values:
#>     - 8
#>     - 9
#>     - 97
#>     - 99
#>   mode: numeric
#>   measurement: nominal
#> region:
#>   annotation:
#>     description: Region of residence
#>     Remark: This is not a real survey item, of course ...
#>   labels:
#>     England: 1
#>     Scotland: 2
#>     Wales: 3
#>     Not applicable: 97
#>     Not asked in survey: 99
#>   value.filter:
#>     class: missing.values
#>     values:
#>     - 97
#>     - 99
#>   mode: numeric
#>   measurement: nominal
#> income:
#>   annotation:
#>     description: Household income
#>     Remark: This is not a real survey item, of course ...
#>   mode: numeric
#>   measurement: ratio
#>