# look at and change global Pmetrics options
setPMoptions()4 Data
4.1 Introduction
Make sure you have run the tutorial setup code in your R session before copying, pasting and running example code here.
Pmetrics always needs data and a model to run. Pmetrics data objects are typically read into memory from files. Although the file format is usually comma-separated (.csv), it is possible to use other separators, like the semicolon, by setting the appropriate argument with setPMoptions.
Examples of programs that can save .csv files are any text editor (e.g. TextEdit on Mac, Notepad on Windows) or spreadsheet program (e.g. Excel).
It is possible to create a data object in R directly, without reading a file. This is useful for simulation purposes, where you may want to create a small dataset on the fly. We’ll cover this below.
4.2 R6 objects
Most Pmetrics objects, including data, follow the R6 framework. The idea of this object is to represent a dataset that is going to be modeled/simulated. All its behaviour is represented by the class PM_data. This class allows datasets to be checked, plotted, written to disk and more. Use PM_data$new("filename") to create a PM_data object by reading the file.
4.3 First data object
# if not using the Rscript/Learn.R template created by PM_tutorial(),
# modify the path as needed
dat <- PM_data$new("src/ex.csv")You can also build an appropriate data frame in R and provide that as an argument to PM_data$new().
# ensure data frame has at least these columns:
# id, time, dose, out
df <- data.frame(id = c(1,1,1,2,2),
time = c(0,1,2,0,1),
dose = c(100,NA,NA,200,NA),
out = c(NA,5.2,3.1,NA,7.4)
)
dat_df <- PM_data$new(df)Lastly, you can take advantage of the addEvent method in PM_data objects to build a data object on the fly. This can be particularly useful for making quick simulation templates. Start with an empty call to PM_data$new() and add successive rows. See PM_data for details under the addEvent method.
# build a PM_data object row by row
dat_add <- PM_data$new()$
addEvent(id = 1, time = 0, dose = 100, addl = 5, ii = 24)$ # add 6 doses of 100 every 24 hours
addEvent(id = 1, time = 144, out = -1)$ # add an observation of -1 at time 144
addEvent(id = 1, wt = 75, validate = TRUE) # add wt of 75 to all rows for id = 1 and validateNotes:
- Lack of time element in the last
addEventwill add wt = 75 to all rows for id = 1 - Use
validate = TRUEas an argument in the lastaddEventto finalize creation - You can chain events as shown above by including the
$between events.
For those familiar with tidyverse or the native R pipe to join functions (“%>%” or “|>”, respectively), chaining in R6 is similar but restricted to methods defined for the object. In this case we chain the addEvent methods. We could even chain an additional PM_data method like $plot() at the end of the above code. However, that would create dat as a plotly plot object, not a PM_data one.
Below you see the data standardization and validation reports that are generated when you create a new PM_data object, and the output of typing dat$data and dat$standard_data look like in the viewer. The former is your original data, and the latter is what it looks like after standardization to the full Pmetrics format.
#>
#> ── DATA STANDARDIZATION ────────────────────────────────────────────────────────
#> EVID inferred as 0 for observations, 1 for doses.
#> All doses assumed to be oral (DUR = 0).
#> ADDL set to missing for all records.
#> II set to missing for all records.
#> All doses assumed to be INPUT = 1.
#> All observations assumed to be OUTEQ = 1.
#> All observations assumed to be uncensored.
#> One or more error coefficients not specified. Error in model object will be used.
#>
#> ── DATA VALIDATION ─────────────────────────────────────────────────────────────
#> No data errors found.
Original data:
| id | time | dose | out | wt |
|---|---|---|---|---|
| 1 | 0 | 100 | NA | 75 |
| 1 | 24 | 100 | NA | 75 |
| 1 | 48 | 100 | NA | 75 |
| 1 | 72 | 100 | NA | 75 |
| 1 | 96 | 100 | NA | 75 |
| 1 | 120 | 100 | NA | 75 |
| 1 | 144 | NA | -1 | 75 |
Standardized data:
| id | evid | time | dur | dose | addl | ii | input | out | outeq | cens | c0 | c1 | c2 | c3 | wt |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 100 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 75 |
| 1 | 1 | 24 | 0 | 100 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 75 |
| 1 | 1 | 48 | 0 | 100 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 75 |
| 1 | 1 | 72 | 0 | 100 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 75 |
| 1 | 1 | 96 | 0 | 100 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 75 |
| 1 | 1 | 120 | 0 | 100 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 75 |
| 1 | 0 | 144 | NA | NA | NA | NA | NA | -1 | 1 | none | NA | NA | NA | NA | 75 |
Once you have created the PM_data object, you never need to create it again during your R session. You also don’t have to bother copying the data file to the Runs folder each time you run the model, like you used to do with older (“Legacy”) versions of Pmetrics. The data are stored in memory and can be used in any Pmetrics function that needs it.
4.4 Data format
R6 Pmetrics can use file or data frame input. The format is very flexible. A truncated example is shown below, with NA values replaced by “.” as they would appear in a file.
| id | time | dose | out | wt |
|---|---|---|---|---|
| 1 | 0 | 600 | . | 46.7 |
| 1 | 24 | 600 | . | 46.7 |
| 1 | 48 | 600 | . | 46.7 |
| 1 | 72 | 600 | . | 46.7 |
| 1 | 96 | 600 | . | 46.7 |
| 1 | 120 | . | 10.44 | 46.7 |
| 1 | 120 | 600 | . | 46.7 |
| 1 | 121 | . | 12.89 | 46.7 |
| 1 | 122 | . | 14.98 | 46.7 |
| 1 | 125.99 | . | 16.69 | 46.7 |
| 1 | 129 | . | 20.15 | 46.7 |
| 1 | 132 | . | 14.97 | 46.7 |
| 1 | 143.98 | . | 12.57 | 46.7 |
| 2 | 0 | 600 | . | 66.5 |
| 2 | 24 | 600 | . | 66.5 |
| 2 | 48 | 600 | . | 66.5 |
| 2 | 72 | 600 | . | 66.5 |
| 2 | 96 | 600 | . | 66.5 |
| 2 | 120 | . | 3.56 | 66.5 |
| 2 | 120 | 600 | . | 66.5 |
| 2 | 120.98 | . | 5.84 | 66.5 |
| 2 | 121.98 | . | 6.54 | 66.5 |
| 2 | 126 | . | 6.14 | 66.5 |
| 2 | 129.02 | . | 6.56 | 66.5 |
| 2 | 132.02 | . | 4.44 | 66.5 |
| 2 | 144 | . | 3.76 | 66.5 |
The only required columns are those below. Unlike Legacy Pmetrics, there are no requirements for a header or to prefix the ID column with “#”. However, any subsequent row that begins with “#” will be ignored, which is helpful if you want to exclude data from the analysis, but preserve the integrity of the original dataset, or to add comment lines. The column order can be anything you wish, but the names should be the same as below. Ultimately, PM_data$new() converts all valid data into a standardized format discussed below.
ID This field can be numeric or character and identifies each individual. All rows must contain an ID, and all records from one individual must be contiguous. IDs may be any alphanumeric combination. The number of subjects is unlimited.
TIME This is the elapsed time in decimal hours since the first event, which is always
TIME = 0, unless you specifyTIMEas clock time. In that case, you must include aDATEcolumn, described below. For clock time, the default format is HH:MM. Other formats can be specified. SeePM_datafor more details. Every row must have an entry, and within a given ID, rows must be sorted chronologically, earliest to latest.- DATE This column is only required if
TIMEis clock time, detected by the presence of “:”. The default format of the date column is YYYY-MM-DD. As forTIME, other formats can be specified. SeePM_datafor more details.
- DATE This column is only required if
DOSE This is the dose amount. It should be “.” for observation rows. All subjects must have a dose event at time 0, which is the first row for that subject. The dose amount can be any numeric value, including 0. If the dose is an infusion, the
DURcolumn must also be included. In other software packages,AMTis equivalent toDOSE.OUT This is the observation, or output value, and it is always required. If
EVID = 0, there must be an entry. For such events, if the observation is missing, e.g. a sample was lost or not obtained, this must be coded as -99. It will be ignored for any otherEVIDand therefore should be “.”.OUTcan be coded asDVin other software packages. WhenOUT = -99, this is equivalent toMDV = 1, or missing dependent variable in other packages, but Pmetrics does not useMDV.
Not required:
- COVARIATES… Covariates are optional and discussed below. Here, wt was included as an example of a covariate.
When PM_data reads a file, it will standardize it to the format below. This means some inferences are made. For example, in the absence of EVID, all doses are interpreted as oral. If they are infusions, DUR must be included to indicate the duration of the infusion. EVID only needs to be included if EVID=4 (reset event) is required, described below. Similarly, INPUT and OUTEQ are only required if multiple inputs or outputs are being modeled. Lastly, ADDL and II are optional.
Lastly, the standardized data are checked for errors and if found, Pmetrics generates a report with the errors and will attempt to fix those that it can.
4.4.1 Standardized Data
Data are standardized when PM_data$new() is invoked, and the data frame is placed in the PM_data object’s $standard_data field. When the $save() method is called on a PM_data object, the data are saved in this standardized format. The first several rows of example standardized data are below, with details following.
| id | evid | time | dur | dose | addl | ii | input | out | outeq | cens | c0 | c1 | c2 | c3 | wt |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 |
| 1 | 1 | 24 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 |
| 1 | 1 | 48 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 |
| 1 | 1 | 72 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 |
| 1 | 1 | 96 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 |
| 1 | 0 | 120 | NA | NA | NA | NA | NA | 10.44 | 1 | none | NA | NA | NA | NA | 46.7 |
| 1 | 1 | 120 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 |
| 1 | 0 | 121 | NA | NA | NA | NA | NA | 12.89 | 1 | none | NA | NA | NA | NA | 46.7 |
| 1 | 0 | 122 | NA | NA | NA | NA | NA | 14.98 | 1 | none | NA | NA | NA | NA | 46.7 |
| 1 | 0 | 125.99 | NA | NA | NA | NA | NA | 16.69 | 1 | none | NA | NA | NA | NA | 46.7 |
| 1 | 0 | 129 | NA | NA | NA | NA | NA | 20.15 | 1 | none | NA | NA | NA | NA | 46.7 |
| 1 | 0 | 132 | NA | NA | NA | NA | NA | 14.97 | 1 | none | NA | NA | NA | NA | 46.7 |
| 1 | 0 | 143.98 | NA | NA | NA | NA | NA | 12.57 | 1 | none | NA | NA | NA | NA | 46.7 |
| 2 | 1 | 0 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 66.5 |
| 2 | 1 | 24 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 66.5 |
| 2 | 1 | 48 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 66.5 |
| 2 | 1 | 72 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 66.5 |
| 2 | 1 | 96 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 66.5 |
| 2 | 0 | 120 | NA | NA | NA | NA | NA | 3.56 | 1 | none | NA | NA | NA | NA | 66.5 |
| 2 | 1 | 120 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 66.5 |
| 2 | 0 | 120.98 | NA | NA | NA | NA | NA | 5.84 | 1 | none | NA | NA | NA | NA | 66.5 |
| 2 | 0 | 121.98 | NA | NA | NA | NA | NA | 6.54 | 1 | none | NA | NA | NA | NA | 66.5 |
| 2 | 0 | 126 | NA | NA | NA | NA | NA | 6.14 | 1 | none | NA | NA | NA | NA | 66.5 |
| 2 | 0 | 129.02 | NA | NA | NA | NA | NA | 6.56 | 1 | none | NA | NA | NA | NA | 66.5 |
| 2 | 0 | 132.02 | NA | NA | NA | NA | NA | 4.44 | 1 | none | NA | NA | NA | NA | 66.5 |
| 2 | 0 | 144 | NA | NA | NA | NA | NA | 3.76 | 1 | none | NA | NA | NA | NA | 66.5 |
ID See above.
EVID This is the event ID field. It can be 0, 1, or 4. It is only required if
EVID = 4is included in the data, in which case every row must have an entry. If there are noEVID = 4events, the entireEVIDcolumn can be omitted from the data.0 = observation
1 = input (e.g. dose)
2, 3 are currently unused
4 = reset, where all compartment values are set to 0 and the time counter is reset to 0. This is useful when an individual has multiple sampling episodes that are widely spaced in time with no new information gathered. This is a dose event, so dose information needs to be complete. The
TIMEvalue forEVID = 4should be 0, and subsequent rows should increase monotonically from 0 until the last record or until anotherEVID = 4event, which will restart time at 0.
TIME See above.
DATE See above.
DUR This is the duration of an infusion in hours. If
EVID = 0(observation event),DURis ignored and should have a “.” placeholder. For a bolus (e.g. an oral dose), set the value equal to 0. As mentioned above, if all doses are oral,DURcan be omitted from the data altogether. Some other packages useRATEinstead ofDUR, but of course, one can convert rate to duration withDUR = DOSE / RATE.DOSE See above.
ADDL This specifies the number of additional doses to give at interval
II.ADDLcan be positive or negative. If positive, it is the number of doses to give after the dose at time 0. If negative, it is the number of doses to give before the dose at time 0. It may be missing (“.”) for dose events (EVID = 1orEVID = 4), in which case it is assumed to be 0. It is ignored for observation (EVID = 0) events. Be sure to adjust the time entry for the subsequent row, if necessary, to account for the extra doses. All compartments in the model will contain the predicted amounts of drug at the end of theIIinterval after the lastADDLdose.II This is the interdose interval and is only relevant if
ADDLis not equal to 0, in which caseIIcannot be missing. IfADDL = 0or is missing,IIis ignored.INPUT This defines which input (i.e. drug) the
DOSEcorresponds to. The model defines which compartments receive the input(s). If only modeling one drug,INPUTis unnecessary, as all values will be assumed to be 1. Other packages may useCMTfor compartment for both inputs and outputs. It is necessary to separate these in Pmetrics and for outputs, designate the corresponding model input number withINPUT(e.g. R[x] or B[x] for infusions and boluses in the model object), not the compartment.OUT See above.
OUTEQ This is the output equation number that corresponds to the
OUTvalue. Output equations are defined in the model file. If only modeling one output, this column is unnecessary, as all values are assumed to be 1. As discussed inINPUT, other packages may useCMTfor compartment for both inputs and outputs. It is necessary to separate these in Pmetrics and for outputs, designate the corresponding model output equation number withOUTEQ, not the compartment.CENS This is a new column as of Pmetrics 3.0.0. It indicates whether the observation is censored, i.e. below a lower limit of quantification or above an upper limit . It can take on four values:
Missing for dose events which are not observations. Use a “.” as a placeholder in your data file.
0 or “none” = not censored
1 or “bloq” = left censored (below lower limit of quantification)
-1 or “aloq” = right censored (above upper limit of quantification)
If there are no censored observations, the entire
CENScolumn can be omitted from the data. In data fitting, left censored observations are handled using the M3 method described by Beal (Beal 2001). Right censored observations are handled similarly, but using the complementary probability. The value in theOUTcolumn is the censoring lower limit of quantification (LLOQ) for left censored observations. It is the upper limit of quantification (ULOQ) for right censored observations. For uncensored observations,OUTis the observed value as usual. For example, ifOUT = 5andCENS = 1orCENS = "bloq", this indicates that the observation is below the LLOQ of 5. IfOUT = 10andCENS = -1orCENS = "aloq", this indicates that the observation is above the ULOQ of 10.C0, C1, C2, C3 These are the coefficients for the assay error polynomial for that observation. Each subject may have up to one set of coefficients per output equation. If more than one set is detected for a given subject and output equation, the last set will be used. If there are no available coefficients, these cells may be omitted. If they are included, for events which are not observations, they can be filled with “.” as a placeholder. In data fitting, if the coefficients are present in the data file, Pmetrics will use them. If missing, Pmetrics will look for coefficients defined in the model.
COVARIATES… Any column named other than above is assumed to be a covariate, one column per covariate. The first row for any subject must have a value for all covariates, since the first row is always a dose. Covariates are handled differently than in Legacy Pmetrics. In Legacy, they were only considered at the times of dose events (
EVID = 1orEVID = 4). In Pmetrics 3.0 and later, they are considered at all times, including observation events (EVID = 0). Therefore, to enter a new covariate value at a time other than a dose or an observation, create a row at the appropriate time (and possibly date if using clock/calendar), making the row either a dose row withDOSE = 0or an observation row withOUT = -99(missing). By default, covariate values are linearly interpolated between entries. This is useful for covariates like weight, which may vary from measurement to measurement. You can change this behavior in the model definition to make them piece-wise constant, i.e. carried forward from the previous value until a new value causes an instant change. This could be used, for example, to indicate periods of off and on dialysis. See the chapter on Models for more details.
4.5 Manipulation of CSV files
4.5.0.1 Read
As we have seen, PM_data$new("path/filename") will create a new PM_data object by reading an appropriate data file in the path directory or the current working directory if path is ommitted. Change the column separator in the file from the default “,” (.csv files) to “;” (.ssv files) using setPMoptions().
4.5.0.2 Save
PM_data$save("path/filename") will save the PM_data$standard_field to a file called “filename” in the path directory or the current working directory if path is ommitted. This can be useful if you have loaded or created a data file and then changed it in R. Change the column separator in the file from the default “,” (.csv files) to “;” (.ssv files) using setPMoptions().
4.5.0.3 Standardize
PM_data$new() automatically standardizes the data into the full format. This includes conversion of calendar date / clock time into decimal elapsed time.
4.5.0.4 Validate
PM_data$new() automatically calls PMcheck so the data are validated as the data object is created.
4.5.0.5 Data conversion
PMwrk2csv()This function will convert old-style, single-drug USC*PACK .wrk formatted files into Pmetrics data .csv files.NM2PM()Although the structure of Pmetrics data files is similar to NONMEM, there are some differences. This function attempts to automatically convert to Pmetrics format. It has been tested on several examples, but there are probably NONMEM files which will cause it to crash.
4.6 More Examples
Pmetrics comes with an example dataset called dataEx already loaded. You can practice with it. It is the same data as in “src/ex.csv” used above to create the dat object.
In the code below and often in this book, file.path is a base R function used to create file paths that are compatible with your operating system.
# Save data somewhere
path <- "src2"
dir.create(path) # create a temporary folder
dataEx$save(file.path(path, "ex2.csv")) # save the data there
dataEx$save("src2/ex.csv") # alternative
# Load it again with one of these alternatives
exData <- PM_data$new(file.path(path, "ex2.csv"))
exData <- PM_data$new("src2/ex2.csv")
unlink("src2", recursive = TRUE) # clean upYou can look at the src/ex.csv file directly by opening from your hard drive it in a spreadsheet program like Excel, or a text editor.
exData is an R6 object, which means that contains both data and methods to process that data.
# See the contents of the object
names(exData)
#> [1] "nca" ".__enclos_env__" "summary" "auc"
#> [5] "addEvent" "post" "clone" "initialize"
#> [9] "standard_data" "save" "print" "plot"
#> [13] "data" "pop"The first element is an artifact of the R6 class. The remaining elements are documented in the help for PM_data. You can of course inspect the data directly.
# Your original data (first few rows)
head(exData$data)
#> id time dose out wt africa age gender height
#> 1 1 0 600 NA 46.7 1 21 1 160
#> 2 1 24 600 NA 46.7 1 21 1 160
#> 3 1 48 600 NA 46.7 1 21 1 160
#> 4 1 72 600 NA 46.7 1 21 1 160
#> 5 1 96 600 NA 46.7 1 21 1 160
#> 6 1 120 NA 10.44 46.7 1 21 1 160Typing the name of the PM_data object will display it nicely in the viewer.
# See the standardized data nicely formatted in the viewer
exDataBelow we show it truncated for brevity.
| id | evid | time | dur | dose | addl | ii | input | out | outeq | cens | c0 | c1 | c2 | c3 | wt | africa | age | gender | height |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 | 1 | 21 | 1 | 160 |
| 1 | 1 | 24 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 | 1 | 21 | 1 | 160 |
| 1 | 1 | 48 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 | 1 | 21 | 1 | 160 |
| 1 | 1 | 72 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 | 1 | 21 | 1 | 160 |
| 1 | 1 | 96 | 0 | 600 | NA | NA | 1 | NA | NA | NA | NA | NA | NA | NA | 46.7 | 1 | 21 | 1 | 160 |
| 1 | 0 | 120 | NA | NA | NA | NA | NA | 10.44 | 1 | none | NA | NA | NA | NA | 46.7 | 1 | 21 | 1 | 160 |
Most Pmetrics objects are R6 objects. As a reminder, you can use the $ operator to access their data fields and methods. Many of them have a $summary() method that prints a summary of the object to the console and a $plot() method that creates a plot of the object. See PM_data for more information on the PM_data class and its methods.
Note: We recognize that many users are familiar with the “S3 framework” in R, which uses functions like summary(object) and plot(object). To comply with better programming standards, Pmetrics uses the R6 framework. However, we have provided S3 methods for most functions, so you can use summary(object) and plot(object) if you prefer.
# S3 method to summarize data
summary(exData)PM_data has a plot() method that creates a plot of the data. See plot.PM_data for more information.
exData$plot()