The PHAB package contains materials to calculate an index of physical integrity (IPI) for California streams. This index is estimated using site-specific data and available metrics of physical habitat to describe overall integrity.
This tutorial assumes that the user is familiar with basic operations in the R programming language, such as data import, export, and manipulation. Although not required, we recommend using an integrated development environment for R, such as R-studio, which can be downloaded at http://www.rstudio.com. New users are encouraged to pursue training opportunities, such as those hosted by local R user groups. A list of such groups may be found here: http://blog.revolutionanalytics.com/local-r-groups.html. R training material developed by SCCWRP can also be found online: https://sccwrp.github.io/SCCWRP_R_training/
The core function is IPI
that requires station and phab data as input.
## StationCode SampleDate SampleAgencyCode PHAB_SampleID IPI
## 1 105PS0107 9/14/2009 SWAMP 105PS0107_9/14/2009_SWAMP 1.16
## 2 205PS0157 6/19/2012 SWAMP 205PS0157_6/19/2012_SWAMP 1.04
## 3 305PS0057 6/16/2009 SWAMP 305PS0057_6/16/2009_SWAMP 0.79
## 4 504PS0147 6/23/2008 SWAMP 504PS0147_6/23/2008_SWAMP 0.78
## 5 632PS0007 7/23/2008 SWAMP 632PS0007_7/23/2008_SWAMP 1.10
## 6 901PS0057 5/14/2008 SWAMP 901PS0057_5/14/2008_SWAMP 1.08
## IPI_percentile Ev_FlowHab Ev_FlowHab_score H_AqHab H_AqHab_pred
## 1 0.90 0.85 0.89 1.59 1.11
## 2 0.62 0.96 1.00 1.42 1.35
## 3 0.04 0.50 0.51 1.32 1.42
## 4 0.03 0.28 0.28 1.24 1.30
## 5 0.79 0.90 0.95 1.51 1.41
## 6 0.74 0.70 0.73 1.52 1.38
## H_AqHab_score H_SubNat H_SubNat_score PCT_SAFN PCT_RC PCT_SAFN_pred
## 1 1.00 1.57 0.83 6 0 24.60
## 2 0.80 1.41 0.74 51 2 22.06
## 3 0.70 0.49 0.26 83 0 29.51
## 4 0.72 0.98 0.52 1 0 34.38
## 5 0.82 1.80 0.95 14 0 13.49
## 6 0.84 1.80 0.95 40 3 34.46
## PCT_SAFN_score XCMG XCMG_pred XCMG_score IPI_qa Ev_FlowHab_qa H_AqHab_qa
## 1 1.00 99 93.64 0.78 1.00 1 1
## 2 0.40 131 104.72 0.90 1.00 1 1
## 3 0.12 152 106.05 1.00 1.00 1 1
## 4 1.00 55 95.41 0.51 1.00 1 1
## 5 0.79 126 123.10 0.76 1.00 1 1
## 6 0.69 122 102.16 0.86 0.95 1 1
## H_SubNat_qa PCT_SAFN_qa XCMG_qa
## 1 1 1 1.00
## 2 1 1 1.00
## 3 1 1 1.00
## 4 1 1 1.00
## 5 1 1 1.00
## 6 1 1 0.95
The PHAB package can be installed from the R console with just a few lines of code. The current version of the package can be found on SCCWRP’s GitHub page here and can be installed using the devtools package. The devtools package must be installed first before the PHAB package can be installed. Start by installing and loading devtools:
Now the install_github()
function from devtools can be used to install PHAB from GitHub. The package can be loaded after installation.
The installation process may take a few seconds. Both the devtools and PHAB packages depend on other R packages, all of which are installed together. If an error is encountered during installation, an informative message is usually printed in the R console. This information can help troubleshoot the problem, such as identifying which dependent packages may need to be installed separately. Please contact SCCWRP staff if additional errors are encountered.
After the package is successfully installed, you will be able to view the help file for the PHAB core function:
The IPI is estimated using station and PHAB metric data as input. Station GIS data can be obtained using the GIS insructions that accompany this document and PHAB metric data can be obtained from the state of California SWAMP reporting module. Sample data are provided with the PHAB package to demonstrate the required format and are loaded automatically once the package is installed and loaded (i.e., just type the name of the sample data in the R console to view). You can view the stations
and phab
example data from the R console:
## StationCode MAX_ELEV AREA_SQKM ELEV_RANGE MEANP_WS New_Long SITE_ELEV
## 1 105PS0107 2587 2002.90397 2050.96 882.4179 -123.0173 536.04
## 2 205PS0157 1111 600.95071 1074.00 604.9312 -121.8352 37.00
## 3 305PS0057 1152 1261.70747 1107.36 544.3272 -121.5114 44.64
## 4 504PS0147 2144 1954.62572 2059.52 781.2332 -122.2178 84.48
## 5 632PS0007 3348 98.66497 1240.06 962.0929 -119.5937 2107.94
## 6 901PS0057 1734 92.31964 1671.43 495.7878 -117.6696 62.57
## KFCT_AVE New_Lat MINP_WS PPT_00_09
## 1 0.1776000 41.71375 10.9294 54996.9
## 2 0.2927333 37.30141 0.7059 40023.9
## 3 0.2840000 36.95052 0.6623 42432.2
## 4 0.2181000 39.77572 2.7024 53740.0
## 5 0.1217000 38.53335 17.9569 67846.9
## 6 0.2854000 33.52814 0.5268 29440.0
## StationCode SampleDate SampleAgencyCode Variable Result Count_Calc
## 1 105PS0107 9/14/2009 SWAMP W1_HALL_EMAP 0.00 11
## 2 105PS0107 9/14/2009 SWAMP W1_HALL_SWAMP 0.00 14
## 3 105PS0107 9/14/2009 SWAMP PCT_CPOM 24.00 105
## 4 105PS0107 9/14/2009 SWAMP Ev_AqHab 0.77 8
## 5 105PS0107 9/14/2009 SWAMP Ev_FlowHab 0.85 4
## 6 105PS0107 9/14/2009 SWAMP Ev_SubNat 0.75 8
The stations
data include site-specific information derived from geospatial analysis. These data are in wide format where one row corresponds to data for one site. The following fields are required:
StationCode
unique station identifierMAX_ELEV
maximum elevation in the watershed in metersAREA_SQKM
watershed area in square kilometersELEV_RANGE
elevation range of the watershed in metersMEANP_WS
mean phosphorus geology from the watershedNew_Long
site longitude, decimal degreesSITE_ELEV
site elevationKFCT_AVE
average soil erodibility factorNew_Lat
site latitude, decimal degreesMINP_WS
minimum phosphorus geology from the watershedPPT_00_09
average precipitation (2000 to 2009) at the sample point, in hundredths of millimetersThe phab
data include calculated physical habitat metrics that are compiled along with the stations
data to get the IPI score. These data are in long format where multiple rows correspond to physical habitat metric values for a single site. The following fields are required:
StationCode
unique station identifierSampleDate
date of the sampleSampleAgencyCode
the sample agency code that collected the dataVariable
name of the PHAB metricResult
resulting metric valueCount_Calc
number of unique observations that were used to estimate the value in Result
Values in the Variable
column of the phab
data indicate which PHAB metric was measured that corresponds to values in the Result
column. The required PHAB metrics that should be provided for every unique sampling event specified by StationCode
and SampleDate
are as follows:
XSLOPE
mean slope of reachXBKF_W
mean bankfull widthH_AqHab
Shannon diversity of aquatic habitat typesPCT_SAFN
percent sand and fine (<2 mm) substrate particlesXCMG
riparian cover sum of three layersEv_FlowHab
evenness of flow habitat typesH_SubNat
Shannon diversity of natural substrate typesXC
mean upper canopy trees and saplingsPCT_POOL
percent pools in reachXFC_ALG
mean filamentous algae coverPCT_RC
percent concrete/asphaltEach PHAB metric serves a specific purpose in the calculation and reporting of the IPI. Some metrics (i.e., H_AqHab
, PCT_SAFN
, XCMG
, Ev_FlowHab
, and H_SubNat
) are used to assess habitat condition. Other metrics (i.e., XSLOPE
, XBKF_W
, and PCT_RC
) are used as predictors or score modifiers for components of the IPI. Finally, some metrics (i.e., the XC
, PCT_POOL
, and XFC_ALG
) are required because they are used for quality assurance checks.
All required fields for the stations
and phab
data are case-sensitive and must be spelled correctly. The order of the fields does not matter. All StationCode
values must be shared between the datasets. As described below, the IPI()
function automatically checks the format of the input data prior to estimating scores.
Five of the required PHAB metrics in the input data are used directly for scoring the IPI, whereas the remainder serve a supporting role as predictors or modifiers for different parts of the complete index. Understanding what each of five core metrics describe about stream condition and how they vary with disturbance is critical for interpreting the index. Below is a detailed description of each metric (excerpted from the tech memo, click for more detail).
Shannon diversity of natural instream cover. H_AqHab
measures the relative quantity and variety of natural structures in the stream, such as cobble, large and small boulders, fallen trees, logs and branches, and undercut banks available as refugia, or as sites for feeding or spawning and nursery functions of aquatic macrofauna. A wide variety and/or abundance of submerged structures in the stream provides macroinvertebrates and fish with a large number of niches, thus increasing habitat diversity. When variety and abundance of cover decreases (e.g., due to hydromodification, increased sedimentation, or active stream clearing), habitat structure becomes monotonous, diversity decreases, and the potential for recovery following disturbance decreases. Snags and submerged logs—especially old logs that have remained in-place for several years–are among the most productive habitat structure for macroinvertebrate colonization and fish refugia in low-gradient streams.
Percent sand and fine substrate. PCT_SAFN
measures the amount of small-grained sediment particles (i.e., <2 mm) that have accumulated in the stream bottom as a result of deposition. Deposition may result from soil disturbance in the catchment, landslides, and bank erosion. Sediment deposition may cause the formation of islands or point bars, filling of runs and pools, and embeddedness of gravel, cobble, and boulders and snags, with larger substrate particles covered or sunken into the silt, sand, or mud of the stream bottom. As habitat provided by cobbles or woody debris becomes embedded, and as interstitial spaces become inundated by sand or silt, the surface area available to macroinvertebrates and fish is decreased. High levels of sediment deposition are symptoms of an unstable and continually changing environment that becomes unsuitable for many organisms. Although human activity may deplete sands and fines (e.g., by upstream dam operations), and this depletion may harm aquatic life, the IPI treats only increases in this metric as a negative impact on habitat quality, although a post-hoc correction was made whereby the metric percent concrete (PCT_RC
) is added to PCT_SAFN
before scoring.
Shannon diversity of natural substrate types. H_SubNat
measures the diversity of natural substrate types, assessing how well multiple size classes (e.g., gravel, cobble and boulder particles) are represented. In a stream with high habitat quality for benthic macroinvertebrates, layers of cobble and gravel provide diversity of niche space. Occasional patches of fine sediment, root mats and bedrock also provide important habitat for burrowers or clingers, but do not dominate the streambed. Lack of substrate diversity, e.g., where >75% of the channel bottom is dominated by one particle size or hard-pan, or with highly compacted particles with no interstitial space, represents poor physical conditions. Riffles and runs with a diversity of particle sizes often provide the most stable habitat in many small, high-gradient streams.
Evenness of flow habitat types. Ev_FlowHab
measures the evenness of riffles, pools, and other flow microhabitat types. Optimal physical conditions include a relatively even mix of velocity/depth regimes, with regular alternation between riffles (fast-shallow), runs (fast-deep), glides (slow-shallow) and pools (slow-deep). Poor conditions occur when a single microhabitat dominates (usually glides, with pools and riffles absent). A stream that has a uniform flow regime will typically support far fewer types of organisms than a stream that has a variety of alternating flow regimes. Riffles in particular are a source of high-quality habitat and diverse fauna, and their regular occurrence along the length of a stream greatly enhances the diversity of the stream community. Pools are essential for many fish and amphibians.
Riparian vegetation cover, sum of three layers. XCMG
measures the amount of vegetative protection afforded to the stream bank and the near-stream portion of the riparian zone. The root systems of plants growing on stream banks help hold soil in place, thereby reducing the amount of erosion likely to occur. The vegetative zone also serves as a buffer to pollutants entering a stream from runoff and provides shading and habitat and nutrient input into the stream. Banks that have full, multi-layered, natural plant growth are better for fish and macroinvertebrates than are banks without vegetative protection or those shored up with concrete or riprap. Vegetative removal and reduced riparian zones occur when roads, parking lots, fields, lawns, bare soil, riprap, or buildings are near the stream bank. Residential developments, urban centers, golf courses, and high grazing pressure from livestock are the common causes of anthropogenic degradation of the riparian zone. Even in undeveloped areas, upstream hydromodification and invasion by non-native species can reduce the cover and quality of riparian zone vegetation.
IPI()
functionThe IPI score for a site is estimated from the station and PHAB data. The score is estimated automatically by the IPI()
function in the package following several steps. First, reference expectations for a site are estimated for predictive metrics using the station data. Then, observed data values are compared to reference expectations for predictive metrics and the differences between observed and predicted (i.e., metric residuals) are used for scoring. For metrics that are not predicted, raw metric values are used for scoring. Metric scores are based on the upper and lower percentiles of either metric residuals or raw metric values observed at reference and high-activity sites. The metric scores are then summed and standardized (i.e., divided) by the mean sum of scores at reference sites to obtain the final IPI score.
The IPI()
function can be used on station and PHAB data that are correctly formatted as shown above. The stations
and phab
example data are in the correct format and are loaded automatically with the PHAB package. These data are used here to demonstrate use of the IPI()
function.
## StationCode SampleDate SampleAgencyCode PHAB_SampleID IPI
## 1 105PS0107 9/14/2009 SWAMP 105PS0107_9/14/2009_SWAMP 1.16
## 2 205PS0157 6/19/2012 SWAMP 205PS0157_6/19/2012_SWAMP 1.04
## 3 305PS0057 6/16/2009 SWAMP 305PS0057_6/16/2009_SWAMP 0.79
## 4 504PS0147 6/23/2008 SWAMP 504PS0147_6/23/2008_SWAMP 0.78
## 5 632PS0007 7/23/2008 SWAMP 632PS0007_7/23/2008_SWAMP 1.10
## 6 901PS0057 5/14/2008 SWAMP 901PS0057_5/14/2008_SWAMP 1.08
## IPI_percentile Ev_FlowHab Ev_FlowHab_score H_AqHab H_AqHab_pred
## 1 0.90 0.85 0.89 1.59 1.11
## 2 0.62 0.96 1.00 1.42 1.35
## 3 0.04 0.50 0.51 1.32 1.42
## 4 0.03 0.28 0.28 1.24 1.30
## 5 0.79 0.90 0.95 1.51 1.41
## 6 0.74 0.70 0.73 1.52 1.38
## H_AqHab_score H_SubNat H_SubNat_score PCT_SAFN PCT_RC PCT_SAFN_pred
## 1 1.00 1.57 0.83 6 0 24.60
## 2 0.80 1.41 0.74 51 2 22.06
## 3 0.70 0.49 0.26 83 0 29.51
## 4 0.72 0.98 0.52 1 0 34.38
## 5 0.82 1.80 0.95 14 0 13.49
## 6 0.84 1.80 0.95 40 3 34.46
## PCT_SAFN_score XCMG XCMG_pred XCMG_score IPI_qa Ev_FlowHab_qa H_AqHab_qa
## 1 1.00 99 93.64 0.78 1.00 1 1
## 2 0.40 131 104.72 0.90 1.00 1 1
## 3 0.12 152 106.05 1.00 1.00 1 1
## 4 1.00 55 95.41 0.51 1.00 1 1
## 5 0.79 126 123.10 0.76 1.00 1 1
## 6 0.69 122 102.16 0.86 0.95 1 1
## H_SubNat_qa PCT_SAFN_qa XCMG_qa
## 1 1 1 1.00
## 2 1 1 1.00
## 3 1 1 1.00
## 4 1 1 1.00
## 5 1 1 1.00
## 6 1 1 0.95
A data frame of IPI scores estimated at each site on each unique sample date is returned. The output data are in wide format with one row for each sample date at a site. Detailed information for each output column is as follows:
StationCode
unique station identifierSampleDate
date of the site visitPHAB_SampleID
unique identifier of the sampling event. Typically, the station code and sample data are sufficient to determine unique sampling events.IPI
score for the index of physical integrityIPI_percentile
the percentile of the IPI score relative to scores at reference sitesEv_FlowHab
evenness of flow habitat types, from the raw PHAB metricEv_FlowHab_score
IPI score for evenness of flow habitat typesH_AqHab
Shannon diversity of natural instream cover types, from the raw PHAB metricH_AqHab_pred
predicted Shannon diversity of natural instream cover typesH_AqHab_score
scored Shannon diversity of natural instream cover typesH_SubNat
Shannon Diversity of natural substrate types, from the raw PHAB metricH_SubNat_score
scored Shannon diversity of natural substrate typesPCT_SAFN
percent sand and fine substrate, from the raw PHAB metricPCT_RC
percent concrete/asphalt, from the raw PHAB metric and is combined with PCT_SAFN
to provide an overall estimate of substrate with poor suitability for macrofaunaPCT_SAFN_pred
predicted percent sand and fine substratePCT_SAFN_score
scored percent sand and fine substrate, includes PCT_RC
XCMG
riparian cover as sum of three layers, from the raw PHAB metricXCMG_pred
predicted riparian cover as sum of three layersXCMG_score
scored riparian cover as sum of three layersIPI_qa
quality assurance for the IPI scoreEv_FlowHab_qa
quality assurance for evenness of flow habitat typesH_AqHab_qa
quality assurance for Shannon diversity of aquatic habitat typesH_SubNat_qa
quality assurance for Shannon diversity of natural substrate typesPCT_SAFN_qa
quality assurance for percent sand and fine substrateXCMG_qa
quality assurance for riparian cover as sum of three layersMetrics are included in the output as observed PHAB metrics, predicted metrics (where applicable), and scored metrics. Observed PHAB metrics are returned as-is from the input data. Some PHAB metrics include a predicted column that shows the modelled metric value based on the environmental setting at a site. Scored PHAB metrics are obtained following the description above.
The last five columns include quality assurance information for the IPI score and select metrics. QA values less than one indicate less quality assurance, usually resulting from metric values being calculated from fewer measurements from a sample than specified by field protocols. IPI_qa
(the overall QA value for the IPI) is based on the lowest score among all metrics. At this time, there is no criterion for flagging an IPI score based on QA measurements; analysts are advised to use their personal judgment when evaluating IPI scores with low QA measurements.
The IPI was calibrated during its development so that the mean score of reference sites is 1; IPI scores near 1 represent locations with conditions similar to reference sites. Scores that approach 0 indicate great departure from reference condition and degradation of physical condition. Scores > 1 can be interpreted to indicate greater physical complexity than predicted for a site given its natural environmental setting. All metric scores are weighted equally to determine the overall IPI score. For observed and scored PHAB metrics, all are expected to decrease under degraded physical conditions, except PCT_SAFN which is expected to increase in response to degradation.
An additional data file is available within the PHAB package that shows calibration data for scoring the IPI metrics. This file is called refcal
and includes observed and predicted scores at reference and high-activity (or “stressed”) sites for the five PHAB metrics. Metrics are scored based on deviation from the 5th and 95th percentile of scores at reference or calibration sites. The refcal
dataset includes observations at these sites that were used to identify percentile cutoffs for estimating metric scores (the dataset is loaded automatically with the PHAB package, just type the name in the console to view).
## Variable StationCode SampleID2 SiteSet Result Predicted
## 1 Ev_FlowHab 000CAT228 000CAT22840400 RefCal 0.77 0.6498149
## 2 Ev_FlowHab 101WE1111 101WE111137474 StressCal 0.94 0.6535123
## 3 Ev_FlowHab 103CDCHHR 103CDCHHR40435 RefCal 0.63 0.7295132
## 4 Ev_FlowHab 103WER026 103WER02637831 RefCal 0.75 0.6590854
## 5 Ev_FlowHab 103WER029 103WER02937832 RefCal 0.94 0.7125525
## 6 Ev_FlowHab 105BVCAGC 105BVCAGC40442 RefCal 0.84 0.6973526
Variable
name of the PHAB metricStationCode
unique station identifierSampleID2
unique identifier of the sampling eventSiteSet
indicating if a site was reference or stressedResult
resulting metric valuePredicted
predicted metric valueThe IPI()
function will evaluate both the stations
and phab
input datasets for correct format before estimating IPI scores. IPI scores will not be calculated if any errors are encountered. The following checks are made:
stations
. That is, input data have one row per station.stations
are in phab
, and vice-versastations
and phab
(see above)variable
field of phab
for each station and sample date (see above). An exception is made for XC
, PCT_POOL
, and XFC_ALG
, which are not necessary for calculating the IPI but are used for optional quality assurance checks.stations
and phab
are non-negative, excluding elevation variables in stations
which may be negative if below sea level (i.e., some locations in southeast California). Moreover, the variables XBKF_W
and Ev_FlowHab
in phab
must also be greater than zero.The IPI()
function will print informative messages to the R console if any of these errors are encountered. It is the responsibility of the analyst to correct any errors in the raw data before proceeding.