Customizing pyPheWAS

Want to use a home-made PheCode map? Wish that pyPheWAS supported linear regression? Well, this is the guide for you! pyPheWAS is developed by grad students in the MASI lab, so it may not contain every feature a user could want. Luckily, where there’s a will (and a modest amount of python knowledge), there’s a way.

On this page we cover:

To get started, open a terminal and clone the pyPheWAS GitHub repo:

git clone https://github.com/MASILab/pyPheWAS

After making local changes, make sure you install your modified package by following the instructions in the Installing Local Changes section.


Custom PheCode Maps

1. Formatting the PheCode Map

The easiest way to integrate a custom map is to match the custom map’s formatting with pyPheWAS’s default map. It is important to keep in mind that separate maps are required for ICD9 and ICD10 codes. The default ICD9-PheCode map is constructed as follows (*required columns).

Column Name Type Description
ICD9* str ICD version 9 code
ICD9 String* str Text description for ICD 9 code
PheCode* str Phenotype Code
Phenotype* str Text description for PheCode
Excl. Phecodes str Range of PheCodes to exclude when using this code to define case subjects
Excl. Phenotypes str Text description for Excl. Phecodes
category int Numeric category
category_string str Text description for category
Notes
  • Only four columns are required; these are sufficient for running the full pyPheWAS pipeline (i.e. Lookup, Model, Plot).
  • Leaving out the optional columns Excl. Phecodes and Excl. Phenotypes will disable the --exclude_phecode=map option in the createPhenotypeFile tool.
  • Leaving out the optional columns category and category_string will disable category coloring in the Manhattan and Log Odds plots produced by pyPhewasPlot.
  • The ICD10-PheCode map is identical to above, with the exception of columns ICD9 and ICD9 String being replaced by ICD10 and ICD10 String, respectively.

2. Incorporating the map into pyPheWAS

The only necessary modification in this case will be telling pyPheWAS where to find the map file. First, save your custom PheCode map in the resources folder of your local clone of the pyPheWAS repository: pyPheWAS/pyPheWAS/resources/. This is also where the default maps are stored.

Next, tell pyPheWAS to load your custom map as either the ICD9, ICD10, or CPT phecode map. This is done at the end of the pyPhewasCorev2.py:L1154 script, as shown below.

#-----------------------------------------------------------------------------
# load ICD maps (pyPheWAS)
# Change the filename argument to match the name of your custom PheCode map(s)
icd9_codes = get_codes('phecode_map_v1_2_icd9.csv')
icd10_codes = get_codes('phecode_map_v1_2_icd10_beta.csv')
# load CPT maps (pyProWAS)
# Change the filename argument to match the name of your custom ProCode map
cpt_codes = get_codes('prowas_codes.csv')
#-----------------------------------------------------------------------------
Notes
  • The ICD9 and ICD10 PheCode maps are merged to obtain one master PheCode list.
  • To install your changes and continue using the command line pyPheWAS interface, make sure you follow the instructions in the Installing Local Changes section.

Alternative Regression Methods

Changes to the regression method may be implemented by modifying the function pyPheWAS.pyPhewasCorev2.fit_pheno_model(). This function accepts feature arrays and other regression settings and returns a statistics vector for the fitted regression. Though technically any statistics package may be used to implement an alternative regression method, we recommend using the statsmodels package.

Once you have picked the alternative model from statsmodels, you will need to make several modifications to fit_pheno_model.

Note

The fit_pheno_model function contains two distinct regression fits: one with regularization (lr=1) and one without regularization (lr=0). The implementation details of these methods vary slightly. However, the fit without regularization is only used when the --legacy flag is activated in pyPhewasModel, so only the regularized fit implementation will be described here.

1. Setting up the regression function

The Logit regression function object is declared by passing the constructor an array of response variable values and a matrix of predictor values (as shown in the snippet from pyPhewasCorev2.py:L412 below). Modify this line to match the declaration structure of your alternate regression.

# column 'y' is the PheCode vector
predictors = covariates.replace(" ", "").split('+')
predictors[0] = 'y'
f = [response.strip(), predictors]
logit = sm.Logit(data[f[0]], data[f[1]])

2. Fitting the regression function

The next line fits the regression function with regularization (pyPhewasCorev2.py:L413). Again, modify this as needed to match your alternate regression method.

model = logit.fit_regularized(method='l1', alpha=0.1, disp=0, trim_mode='size', qc_verbose=0)

3. Formatting the regression function stats

Finally, you need to pull the stats from your fitted regression. fit_pheno_model should return the following values (in order): -log10(p-value), p-value, beta, beta’s confidence interval, and beta’s standard error. These are pulled from the fitted model as shown below (pyPhewasCorev2.py:L417). Check the API of your alternative regression model to ensure that these values are the same.

# get results for y (the PheCode vector)
p = model.pvalues.y
beta = model.params.y
conf = model.conf_int()
conf_int = '[%s,%s]' % (conf[0]['y'], conf[1]['y'])
stderr = model.bse.y
reg_result = [-math.log10(p), p, beta, conf_int, stderr]  # collect results

Installing Local Changes

After making changes to the pyPheWAS repository, you may install your local version of the package by running the following from a terminal:

cd pyPheWAS # change to the top-level pyPheWAS repository
python setup.py sdist # build the local package
pip install . --upgrade # install the package