Registration Dossier

Administrative data

Endpoint:
water solubility
Type of information:
(Q)SAR
Adequacy of study:
key study
Reliability:
2 (reliable with restrictions)
Rationale for reliability incl. deficiencies:
accepted calculation method
Justification for type of information:
1. SOFTWARE
US EPA WSKOWWIN

2. MODEL (incl. version number)
V1.42 (September 2010)

3. SMILES OR OTHER IDENTIFIERS USED AS INPUT FOR THE MODEL
O=C(CCCN(C)C)CCCCCCC
O=C(CCCCCCCCCCC)CCCN(C)C
O=C(CCCN(C)C)CCCCCCCCCCCCCCCCC

4. SCIENTIFIC VALIDITY OF THE (Q)SAR MODEL
The WSKOWWIN program estimates the water solubility of an organic compound using the compounds log octanol-water partition coefficient (log Kow). A brief description is given below.

Data Collection
A database of more than 8400 compounds with reliably measured log Kow values had already been compiled from available sources. Most experimental values were taken from a "star-list" compilation that had already been critically evaluated or an extensive compilation that includes many "recommended" values based upon critical evaluation.  Other log Kow values were taken from sources located through the Environmental Fate Data Base (EFDB) system. A few values were taken from Section 4a, 8d, and 8e submissions the to U.S. EPA under the Toxic Substances Control Act (see http://www.syrres.com/esc/tscats_info.htm). 

Water solubilities were collected from the AQUASOL dATAbASETM of the University of Arizona (Yalkowsky and Dannenfelser, 1990), Syracuse Research Corporation's PHYSPROP© Database (SRC,1994), and sources located through the Environmental Fate Data Base (EFDB) system. Water solubilities were primarily constrained to the 20-25 °C temperature range with 25 °C being preferred. 

Melting points were collected from sources such as AQUASOL dATAbASETM,  PHYSPROP©, and EDFB as well as the Handbook of Chemistry and Physics and the Aldrich Catalog.

Regression & Results
A dataset of 1450 compounds (941 solids, 509 liquids) having reliably measured water solubility, log Kow and melting point was used as the training set for developing the new estimation algorithms for water solubility. Standard linear regressions were used to fit  water solubility (as log S) with log Kow, melting point and molecular weight.

Residual errors from the initial regression fit were examined for compounds sharing common structural features with relatively consistent errors.  On that basis, 12 compound classes were initially identified and added to the regression to comprise a multi-linear regression including log Kow, melting point and/or molecular weight plus 12 correction factors.  Each correction factor is counted a maximum of once per structure [if applicable], no matter how many times the applicable fragment occurs.  For example, the nitro factor in 1,4-dinitrobenzene is counted just once.  A compound either contains a correction factor or it doesn't; therefore, the matrix for the multi-linear regression contained either a 0 or 1 for each correction factor. Appendix E describes the correction factors and coefficients used by WSKOWWIN.

WSKOWWIN estimates water solubility for any compound with one of two possible equations. The equations are: 

    log S (mol/L)  =  0.796 - 0.854 log Kow - 0.00728 MW + ΣCorrections

    log S (mol/L)  =  0.693 - 0.96 log Kow - 0.0092(Tm-25) - 0.00314 MW + ΣCorrections

(where MW is molecular weight, Tm is melting point (MP) in deg C [used only for solids]) ... Summation of Corrections (ΣCorrections) are applied.


Estimation Accuracy
Training
The regression equations used by the WSKOWWIN program were trained with a dataset of 1450 compounds.
WSKOWWIN estimates water solubility with one of two possible equations.  When an experimental melting point is available, WSKOWWIN applies the equation containing both a melting point and the molecular weight (MW) parameters.  In the absence of a melting point, the equation containing just the molecular weight is used to make the estimate.  All compounds in the 1450 compound training set have known melting points or are known to be liquids at 25 °C.

Validation
The WSKOWWIN estimation equations were initially validated on two datasets of compounds that were not included in the model training.  A relatively small dataset was tested that consisted of 85 compounds having experimental log Kow values, but no available melting points.  Many compounds in the 85 compound test set decompose before melting and would theoretically have very high melting points (e.g. amino acids and compounds having multiple nitrogens).
A much larger dataset of 817 compounds was also tested.  All 817 compounds had experimental melting points, but none of the 817 compounds had a reliable experimental log Kow. The log Kow values used for the validation-testing were estimated (primarily using the KOWWIN program available at that time); therefore, the water solubility estimates are based on estimates for log Kow. Typically, estimates based on estimates reduce estimation accuracy, but this type of validation can provide insight into the ability of the method.

The WSKOWWIN program applies an individual correction factor only once per structure [if at all] regardless of how many instances of the applicable structural feature occur in the structure. The minimum number of instances is zero and the maximum is one.

Range of water solubilities in the Training set:

Minimum  =  4 x 10-7 mg/L (octachlorodibenzo-p-dioxin)

Maximum =  completely soluble (various)

Range of Molecular Weights in the Training set:

Minimum  =  27.03 (hydrocyanic acid)

Maximum =  627.62 (hexabromobiphenyl)

Range of Log Kow values in the Training set:

Minimum  =  -3.89 (aspartic acid)

Maximum =  8.27 (decachlorobiphenyl)

Currently there is no universally accepted definition of model domain.  However, users may wish to consider the possibility that water solubility estimates are less accurate for compounds outside the MW range, water solubility range and log Kow range of the training set compounds.  It is also possible that a compound may have a functional group(s) or other structural features not represented in the training set, and for which no correction factor was developed.  These points should be taken into consideration when interpreting model results.

Data source

Reference
Reference Type:
other: unpublished calculation
Title:
Determination of physico-chemical properties and environmental fate using EPIWIN v4.0
Author:
Evonik
Year:
2009
Bibliographic source:
http://epa.gov/oppt/exposure/pubs/episuite.htm

Materials and methods

Test guideline
Qualifier:
no guideline followed
Principles of method if other than guideline:
Amides, C8-18 even numbered, C18 unsatd., N-[3-(dimethylamino)propyl] represent a mixture containing C8-, C10-, C12-, C14-, C16-, C18- and C18 unsatd. alkyl chains. Based on this and the variable composition of the compound (alkyl chain distribution dependent on origin of the fatty acid) the calculation of physico-chemical properties for the mixture is not feasible. To get a hint on the physico-chemical data, a calculation was conducted for the C8, C12 derivate (with 40-60 percentages in a substance) and the C18 derivates. The water solubilities of the C8, C12 and C18 derivates were calculated using EPIWIN v4.0, WSKOW v1.41.
GLP compliance:
no
Remarks:
not applicable
Type of method:
other: QSAR

Test material

Constituent 1
Reference substance name:
Amides, C8-18, C18 unsatd, N-[3-(dimethylamino)propyl]
IUPAC Name:
Amides, C8-18, C18 unsatd, N-[3-(dimethylamino)propyl]

Results and discussion

Water solubilityopen allclose all
Water solubility:
257 mg/L
Temp.:
25 °C
Remarks on result:
other: C8 derivate
Water solubility:
2.6 mg/L
Temp.:
25 °C
Remarks on result:
other: C12 derivate
Water solubility:
0.003 mg/L
Temp.:
25 °C
Remarks on result:
other: C18 derivate

Applicant's summary and conclusion

Conclusions:
Interpretation of results (migrated information): other: insoluble - soluble
Amides, C8-18 even numbered, C18 unsatd., N-[3-(dimethylamino)propyl] represent a mixture containing C8-, C10-, C12-, C14-, C16-, C18- and C18 unsatd. alkyl chains. Based on this and the variable composition of the compound (alkyl chain distribution dependent on origin of the fatty acid used) the calculation of physico-chemical properties for the mixture is not feasible. To get a hint on the physico-chemical data, a calculation was conducted for the C8, C12 and the C18 derivates. Based on the log Kow values of 2.44 (C8 derivate), 4.4 (C12 derivate) and 7.35 (C18 derivate), water solubilities of 257 mg/L (C8 derivate), 2.623 mg/L (C12 derivate) and 0.0025 mg/L (C18 derivate, at 25 °C each) were calculated using EPIWIN v4.0, WSKOW v1.41. The calculation of the water solubilities using the fragment method yielded values of 4660 mg/L (C8 derivate), 41.269 mg/L (C12 derivate) and 0.032 mg/L (C18 derivate; at 25°C each). According to the classification scheme, the C8 derivate can be regarded as moderately soluble (result based on log Kow estimate) and soluble (result obtained via the fragment method), the C12 slightly soluble and the C18 derivate as insoluble (results based on log Kow estimate and fragment method). Due to missing information about the applicability of the calculation model in respect to the substance under investigation the results should be treated with care.