Registration Dossier

Data platform availability banner - registered substances factsheets

Please be aware that this old REACH registration data factsheet is no longer maintained; it remains frozen as of 19th May 2023.

The new ECHA CHEM database has been released by ECHA, and it now contains all REACH registration data. There are more details on the transition of ECHA's published data to ECHA CHEM here.

Diss Factsheets

Toxicological information

Epidemiological data

Currently viewing:

Administrative data

Endpoint:
epidemiological data
Type of information:
experimental study
Adequacy of study:
key study
Study period:
1990-2018
Reliability:
1 (reliable without restriction)
Rationale for reliability incl. deficiencies:
test procedure in accordance with generally accepted scientific standards and described in sufficient detail

Data source

Reference
Reference Type:
publication
Title:
Unnamed
Year:
2020

Materials and methods

Study type:
other: Selection of studies reviewed, largely cross-sectional.
Endpoint addressed:
genetic toxicity
other: haematotoxicity
Test guideline
Qualifier:
no guideline followed
Principles of method if other than guideline:
It is well established that benzene exposure can result in haematotoxic (including immunotoxic), genotoxic and carcinogenic (i.e. leukaemogenic) effects. Early studies (e.g.(Aksoy et al., 1971; Goldwater, 1941; Greenburg et al., 1939)) found that high benzene exposure had a suppressive effect on peripheral blood cell counts. Lower exposures (e.g.<10 ppm) produce only mild decrements in blood cell counts via changes in marrow function. These decrements in circulating blood cells are likely one of the earliest clinical effects of benzene exposure (DECOS, 2014; North et al., 2020a).
The occurrence of cytogenetic damage in individuals exposed to high levels of benzene has been recognised from the 1960s.
Experimental evidence points to the genetic toxicology profile of benzene related to chromosomal damage rather than gene mutation. in
vivo animal studies reinforce the conclusion that gene mutation activity is a negligible part of the genetic toxicology profile of benzene, as
discussed in the companion paper (North et al., 2020a). A complete inventory and literature review of epidemiological studies
of benzene exposure reveals that early studies, largely before 1980, focused on higher exposures, where benzene concentrations often exceeded
50 ppm in air. More recent studies have examined effects of lower exposures, and many of these studies have also documented a depression of peripheral blood counts or changes in chromosome aberrations and micronucleus formation, often in the absence of aplastic anaemia or pancytopenia. Especially for genotoxicity, study groups exposed to lower concentrations often come from environments in which a mixture of compounds is present, making it more difficult to ascribe effects to benzene versus other compounds such as polycyclic aromatic hydrocarbons (PAHs).
In this paper, we focus on the numerous studies in the literature that examine benzene exposure and haematotoxicity and genotoxicity. We have scrutinized the quality of these studies, so that the higher quality studies can be used to arrive at more robust conclusions regarding the concentrations of benzene that show definitive haematological and/or genotoxic effects.
GLP compliance:
no

Test material

Constituent 1
Chemical structure
Reference substance name:
Benzene
EC Number:
200-753-7
EC Name:
Benzene
Cas Number:
71-43-2
Molecular formula:
C6H6
IUPAC Name:
benzene
Test material form:
liquid: volatile

Method

Type of population:
occupational
Ethical approval:
not applicable
Details on study design:
Here we describe the methods we used to (a) identify appropriate haematological and genotoxic studies of benzene exposure and (b) classify such studies with regard to epidemiological quality criteria, (c) identify unique study populations, (d) select the highest quality studies, (e) select Lowest Observed Adverse Effect Concentrations (LOAECs) and No Observed Adverse Effect Concentrations (NOAECs), and (f) classify LOAECS and NOAECs with respect to certainty.

The focus of the search strategy was to identify studies on hematotoxicity and genotoxicity in a relevant dose range of interest. Studies
before 1980 (hematotoxicity) and 1990 (genotoxicity) are on higher exposures, hence searches focused on studies published on or after these
years.
Haematotoxicity: Several literature searches using Medline, Toxline, Pubmed, Chemical Abstracts, and Web of Science were performed starting in approximately 1995 through 2015 and key studies on any aspect of hematotoxicity were gleaned from these early searches. The keywords used were hematox#, myelotox#, lymphatic, blood, autoimmune, immunotox, anemia, cytopenia, hematopo#, epidemiol#, worker#, bimonitor# AND benzene. (# indicates a wild-card term). This strategy identified 286 potential haematology studies published after 1979 for inclusion. This list was supplemented with reference lists from (ATSDR, 2007, 2015; ECHA, 2018a, b; IARC, 2018) which added another 35 studies. Titles of the 321 studies were reviewed. Of the 321 studies, 206 could be definitively excluded at this stage since they were not epidemiological studies. The full text of the remaining 115 studies
was then reviewed. At this stage it was possible to eliminate another 72 studies that would obviously fail the screening criteria for various
reasons (e.g. did not study benzene per se, pre-1980 study, etc.). This left 43 studies to be fully assessed for quality by the panel of scorers
(including the preliminary screening criteria) as described below.
Genotoxicity: Searches were carried out for the period 1990 to September 2018 using Pub Med, Science Direct, TOXNET and EPA
databases and the key words are genotoxicity OR micronucleus OR chromosome aberration OR DNA damage AND S-phenylmercapturic
acid OR bloodbenzene OR urinary benzene OR benzene. Removal of duplicates identified approximately 1100 potential studies. A review to
eliminate non-epidemiological research excluded about 900 studies. The full text of the remaining 200 studies was examined. At this stage it
was possible to eliminate another 106 studies leaving 94 studies to be fully assessed for quality by the scoring panel (see below).

All studies that were selected from the literature review were further
assessed in terms of their epidemiological quality. The scoring
scheme from (Vlaanderen et al., 2008) was used as the most appropriate
starting point. This was due to its (a) applicability to epidemiological
research, (b) applicability to risk assessment of which an OEL
derivation is a form thereof, and (c) a sharp focus on exposure assessment
(EA), which is often a weak link in these studies. There are two
aspects in the (Vlaanderen et al., 2008) guidelines that were revised for
the present exercise. First, most studies on haematotoxicity and genotoxicity
due to benzene exposure are shorter-term cross-sectional studies. Thus, the focus is not on long-term exposure over decades (as in
the Vlaanderen criteria), but rather more recent exposure over days,
weeks, or months. Indeed, many of the studies examined individual
exposure, and many also included biomonitoring results. Thus, some of
the EA criteria were modified to suit studies of a shorter-term focus. As
an example, Vlaanderen’s criteria for exposure metric was evaluated
against a strategy that focused on exposure in the weeks to months prior
to phlebotomy, rather than decades in the distant past. Secondly, the
focus in the present exercise is not necessarily arriving at the best dataset
for input into quantitative risk assessment modelling exercises, but
rather arriving at the studies that are the best ones for arriving at
conclusions regarding whether specific concentrations of benzene either
do or do not have an effect at a population level. Thus, there was
more emphasis in our revised guidelines on appropriate exposure categorisation
to aid the process of deriving LOAECs and NOAECs.
Exposure assessment:
measured
Statistical methods:
Statistical analysis within studies was reviewed in the quality scoring criteria as OD-1 STRENGTH OF THE STATISTICAL ANALYSIS (0–5). This criterion assesses whether (a) multivariate methods were used that allow for best estimation of the effect while controlling for important covariates,
(b) whether an evaluation of how well the data fit the model was performed, (c) whether the model covered at least three exposure categories, (d) whether the exposure categories were sufficiently accurate to permit a good estimate of a NOAEC or LOAEC, and (e) whether some recognition of multiple hypothesis evaluation (i.e. multiple comparisons) was accounted for, if applicable.

We made our determination of both LOAECs and NOAECs as transparent as possible.
Most often, results are available for exposure categories (e.g.<1 ppm,
1−5 ppm, etc.). We employed a hierarchical preference for central
tendency measures, if they were available for a given LOAEC or NOAEC
category. We often erred on the side of conservatism in identifying LOAECs and
NOAECs. This meant that the lowest LOAEC was sought, regardless of
whether it was from the best or most complete analysis. Also of note is
that when LOAECs were distributed bimodally, we used the lower
group of LOAECs, even if these were of slightly lower quality.
As such LOAECs and NOAECs carried with them some uncertainty,
even if a study was ranked highly. This was partially accounted for by
classifying LOAECs and NOAECs as more or less certain. This approach
allowed for more robust LOAECs and NOAECs to carry more interpretive
weight, which we regard as a strength of the approach.
The definition of “high quality” was made after inspection of the
distribution of quality scores. Observing no “natural breaks” we relied
on two definitions of high quality – the top tertile and top half of the
distribution. Using both definitions of high quality allowed us to test the
robustness of results using sensitivity analyses.

Results and discussion

Results:
3.1. Quality scoring results for genotoxic and haematological studies
Among the group of 31 haematology and 56 genotoxicity study
populations each had a top score of 20 (of a possible 24), which in both
cases was due to the (Qu et al., 2003) study. Both haematotoxicity and
genotoxicity studies showed wide ranges (8–20 and 6–20, respectively)
indicating marked differences in study quality for each body of literature.
Due to ties in scores for the haematotoxicity studies, this initial
stratification resulted in 11 studies in the top tertile (score range
14.5–20), 9 studies in the second tertile (score range 11–14), and 16
studies which scored at or above the median score of 12.5 (see Fig. 1
and Table 2). Similarly, for genotoxicity studies, Fig. 2 and Table 3
show the 21 studies in the first tertile (score range 13.5–20), 17 studies
in the second tertile (score range 11–13), and 29 studies at or above the
median score of 12.5.

3.2.1. Haematology
3.2.1.1. Derivation of LOAECs. Table 2 shows that the highest quality
studies (i.e. first tertile) that generated a more certain LOAEC were: Qu
et al., 2003(2.26 ppm, neutrophils), Schnatter et al., 2010, (7.8 ppm,
neutrophils), Ward et al., 1996,(7.2 ppm, total leukocytes), Lan et al.,
2004, (2.2 ppm, various cell types), Rothman et al., 1996, (7.6 ppm,
lymphocytes), and Zhang et al., 2016 (2.1 ppm, leukocytes). From these
values, a bimodal distribution results, in which there are two clusters of
studies: three studies that suggest a LOAEC near 2 ppm and three
studies that suggest a LOAEC near 7−8 ppm.
Looking at the LOAECs in which all studies at or above the median
quality score are considered as high quality, the LOAECs are similar,
with only (Bogadi-Šare et al., 2003) at 8 ppm added to the above list
from the first tertile. Thus, there are four studies suggesting a LOAEC of
7−8 ppm and three studies suggesting a LOAEC near 2 ppm. This alternative
definition of high quality is a sensitivity analysis that supports
the top tertile result. For the highest quality (top tertile) studies that
generated a less certain LOAEC, values were: (Swaen et al., 2010)
(0.75 ppm); and (Koh et al., 2015) (2.6 ppm). Inclusion of these studies
is another sensitivity analysis that would lend more weight to a LOAEC
in the range of 2 ppm, rather than the second cluster at 7−8 ppm.
Adding in the less certain LOAECs above the median yields LOAECs of
0.75 ppm (Swaen et al., 2010), 2.6 ppm (Koh et al., 2015), and
0.04 ppm (Li et al., 2018), all below values of 7−8 ppm.
These sensitivity analyses are summarised in Fig. 3.
Fig. 3 shows the effect on average LOAECs and average quality
scores of including only high-quality studies with a LOAEC near 2 ppm
versus all high-quality studies, as well as the effect on LOAECs and
scores of defining high quality based on broader groups (i.e. scores
above the median and less certain LOAECs). Thus, while the stronger
studies would slightly favour a LOAEC in the 7−8 ppm range, rather
than the 2 ppm range, various sensitivity analyses based on the less
certain data would favour a LOAEC in the 2 ppm range, without a large
decrease in the average quality score.

3.2.1.2. Derivation of NOAECs. For first tertile studies, the more certain
NOAECs are 0.25 ppm (Swaen et al., 2010), 2.9 ppm (Schnatter et al.,
2010), 2.2 ppm (Ward et al., 1996), 0.19 ppm (Collins et al., 1991),
0.21 ppm (Koh et al., 2015), and 1.7 ppm (Pesatori et al., 2009). Thus,
there are three studies that suggest a NOAEC near 2−3 ppm, and three
studies that suggest a NOAEC near 0.2−0.25 ppm. When studies that
scored above the median and that show a more certain NOAEC are
factored in, the NOAECs are 0.55 ppm (Collins et al., 1997), 0.81 ppm
(Khuder et al., 1999), and 0.33 ppm (Tsai et al., 2004). Collectively, all
studies above the median with more definitive NOAECs show four
studies near 0.2−0.3 ppm, two studies near 0.6−0.8 ppm, and three
studies near 2−3 ppm.

3.2.2. Genotoxicity
The quality scores for genotoxicity studies that scored in the first
two tertiles are presented in Table 3.
3.2.2.1. Factory workers. Of the 21 studies in the top tertile, ten studies
were among factory workers, five among fuel handling workers and six
among workers exposed to traffic and ambient air. In factory workers,
the five studies with more certain LOAECs were (Qu et al., 2003)
(LOAEC=3.07 ppm), (Xing et al., 2010)(LOAEC>1.6 ppm), (Zhang
et al., 2012) (LOAEC>2.64 ppm), (Zhang et al., 2007)
(LOAEC=13.6 ppm) and (Zhang et al., 2014) (LOAEC=2 ppm). The
top tertile study generating a less certain LOAEC (>0.56 ppm) was
(Kim et al., 2004a) due to the presence of PAH co-exposures.
3.2.2.2. Fuel workers. Three studies (Carere et al., 1995; Pandey et al.,
2008 and Rekhadevi et al., 2010) in the top tertile were associated with
a more certain LOAEC and none with a less certain LOAEC. The three
studies showed similar LOAECs of 2 ppm, 2 ppm, and > 1 ppm,
respectively. A NOAEC in the Carere study for micronuclei is
0.47 ppm and in the Pandey study ∼0.9 ppm. The quality scores of
the first tertile fuel studies (14.5) are lower than those from the factory
setting (17.25).
3.2.2.3. Traffic/ambient air. There were only two studies (Leopardi
et al., NOAEC=0.003 ppm; Maffei et al., LOAEC=0.008 ppm) in the
top tertile which produced a more certain LOAEC or NOAEC. Violante
et al. (15.5) has a less certain NOAEC of 0.005 ppm and Angelini (14.5)
has a less certain LOAEC of 0.006 ppm. Since the exposure
concentrations present in the traffic/ambient air studies are lower
than other NOAECs based on fuel and factory studies, this group of
studies does not add meaningful information to the NOAEC analysis.
Since the single top tertile study that showed a more certain LOAEC is
of lower quality (13.5) than studies from the factory and fuel sectors
(average=16.07), this group of studies also does not add meaningful
information to the LOAEC analysis. Thus, these studies are not
subsequently considered.
3.2.2.4. Derivation of LOAECs. The highest quality studies (i.e. first
tertile) that generated a more certain LOAEC originated from the
factory and fuel study scenarios. There were five such studies from
the factory scenario: Qu et al. (LOAEC=3.07 ppm), Xing et al.
(LOAEC>1.6 ppm), Zhang et al. (2012) (LOAEC>2.64 ppm), Zhang
et al., 2007(LOAEC=13.6 ppm), and Zhang 2014 (LOAEC=2 ppm).
Since Zhang et al., 2007 studied mainly higher exposures, it can be
excluded. The four remaining high-quality factory studies result in an
average LOAEC of 2.33 ppm. This is the best supported LOAEC (leading
case) since it is a weighted average of the highest quality studies, with
an average quality score of 17.25. When the three additional studies
from the fuel scenario: Carere et al. (2 ppm), Rekhadavi et al. (1 ppm),
and Pandey et al. (2 ppm) are added, the resulting LOAEC is 2.04 ppm,
which can be regarded as the sensitivity analysis based on the next
highest quality studies.
If high quality is defined more inclusively as studies above the
median, adding the one additional study from the factory setting with a
more certain LOAEC (Eastmond et al., 1.29 ppm) with the other first
tertile more certain factory studies, results in an average LOAEC of
2.12 ppm. While the average quality score in this sensitivity analysis
decreases to 16.3 (from 17.25), it supports a LOAEC of approximately
2 ppm.
There were no additional studies from the fuel nor ambient scenarios
which generated more certain LOAECs above the median score of
12.5. Collectively, all high certainty LOAECs above the median score
from the factory and fuel sector result in a LOAEC of 1.95 ppm (average
score – 14.85). Although average quality decreases somewhat, this also
supports an aggregate LOAEC of ∼2ppm. These sensitivity analyses are
displayed in Fig. 4.
Confounding factors:
Confounder control within reviewed studies was scored as part of the quality assessment of studies in the following way:

OD-2 CONTROL OF POTENTIAL CONFOUNDERS (0–3). This criterion
assesses whether confounders were appropriately controlled either
by restriction, matching, stratification techniques, or regression
modelling. Significance testing (say whether the proportion of females
was different for exposed vs. control groups) was generally not considered
a form of control, although if frequencies of such potential
confounders was virtually the same between groups, this was considered
a weak form of assessment of confounding. Generally, a score of
3 was rarely attained, as it implies all potential confounders were
controlled. The important confounders included age, sex, diet, smoking,
alcohol use, medications, diet (iron depletion, meat consumption, folic
acid/B12), X-rays, recent/current diseases/infections, menstruation/
menopause, other diseases, and other exposures.
Strengths and weaknesses:
Our approach can be summarized as follows: (a) provide rationale
for the use of haematotoxicity and genotoxicity studies in developing an
OEL, (b) perform literature searches to identify such studies performed
on human populations, (c) screen out un-useable studies (i.e. reviews,
non-human data, no benzene concentrations, etc.), (d) develop/adapt a
scheme to rank the quality of the remaining studies, (e) determine
LOAECs and NOAECs for the highest quality studies, (f) provide criteria
and specify more and less certain LOAECs and NOAECs, (g) determine
aggregate LOAECs and NOAECs for highest quality studies and more
certain LOAECs/NOAECs, (h) test results from (g) with sensitivity
analyses for different determinations of “high quality” studies, (i) use
results and additional assessment factors to determine an OEL for
benzene exposure.
Since many parts of this approach are standard practices in developing
an OEL (i.e. items b, c, e, g, and i), they are neither a strength nor
weakness. We regard the primary strengths of our approach to be the
use of a quality scoring scheme to identify the strongest studies, the
criteria for identifying more and less certain LOAECs and NOAECs, and
the use of sensitivity analyses to test our conclusions. We based our study quality ranking system on that developed previously
by (Vlaanderen et al., 2008). Use of this scheme, and the
adaptations we made to it, allowed the development of an OEL based on
the highest quality information.

Weaknesses

Use of quality scoring for observational research is not without
detractors. Greenland and O.Rourke (2001) and Greenland (1994)
argue that an overall quality score is likely misleading and an oversimplification
of the more important components of quality, which in
themselves can be more informative. They argue that the direction of
bias can be different for different components of quality. This is a valid
point. The scheme we used, adapted from Vlaanderen et al., 2008 has
13 quality dimensions, including seven that are related to exposure
assessment. A future exercise could be to examine each of these 13
dimensions for effects on the LOAECs and/or NOAECs. However, the
lack of a gold standard to evaluate bias can detract from this approach.
In addition, use of the Vlaanderen (2008) tool has been used successfully
in risk assessments on asbestos (Burdorf and Heederik, 2011),
arsenic (Tsuji et al., 2015) and myelodysplastic syndrome (Li W and
Schnatter AR, 2018). Thus, there is some precedent in using total study
quality assessment for risk assessment, whilst use of component quality
scores (rather than a total score) has been the recommended approach
for meta-analyses. It is also intuitive that studies ranked highly on
several simultaneous quality dimensions should produce results that are
more reliable and scientifically rigorous. In addition, we have used the
quality scoring as a filtering activity to focus on the highest quality
studies, rather than using the score as a weighting factor to calculate an
OEL. Further, the potential for bias based on use of quality is reduced by
incorporating the results of multiple high quality studies, rather than as
a basis for the selection of a single “best study” approach.

Another difficulty in quality assessment is assigning (and summing)
component scores. Our approach yielded a total possible score of 24,
broken down as follows: exposure assessment 7, statistical analysis 5,
confounder control 3, selection/information bias 3, power/precision 2,
sensitivity analyses 2, and outcome blinding 2. Admittedly, the value of
scores for each component is somewhat arbitrary, but this approach
reflects our initial assessment of both the importance of each area and the ability to distinguish studies based on the written reports. Clearly,
future research that explores disaggregation of the components should
be encouraged.
The identification of LOAECs and NOAECs is also not always
straightforward in human observational studies. We made our determination
of both LOAECs and NOAECs as transparent as possible.
Most often, results are available for exposure categories (e.g.<1 ppm,
1−5 ppm, etc.). We employed a hierarchical preference for central
tendency measures, if they were available for a given LOAEC or NOAEC
category, making choices for specific concentrations less arbitrary. We
often erred on the side of conservatism in identifying LOAECs and
NOAECs. This meant that the lowest LOAEC was sought, regardless of
whether it was from the best or most complete analysis. Also of note is
that when LOAECs were distributed bimodally, we used the lower
group of LOAECs, even if these were of slightly lower quality.
As such LOAECs and NOAECs carried with them some uncertainty,
even if a study was ranked highly. This was partially accounted for by
classifying LOAECs and NOAECs as more or less certain. This approach
allowed for more robust LOAECs and NOAECs to carry more interpretive
weight, which we regard as a strength of the approach.
The definition of “high quality” was made after inspection of the
distribution of quality scores. Observing no “natural breaks” we relied
on two definitions of high quality – the top tertile and top half of the
distribution. Using both definitions of high quality allowed us to test the
robustness of results using sensitivity analyses, another strength of our
approach.
One possible criticism of our approach is that we define NOAECs
based on the lack of statistical significance (i.e. a true LOAEC might be
misidentified as a NOAEC due to a small sample size). However, we
would downgrade such an effect as less certain if the NOAEC category
was based on fewer than 20 workers. This approach addresses the
concern somewhat, since this minimum sample size of 20 for a nonsignificant
effect assures that a very large effect (which would be possible
for even smaller samples) would not be regarded as more certain.
Also germane to this fact is that our sample size criterion for an uncertain
NOAEC (n=20) is greater than that for an uncertain LOAEC
(n=10). An uncertain LOAEC based on small numbers could produce a
spurious statistically significant effect (i.e. a true NOAEC was misidentified
as a LOAEC because of the large variability associated with a
small sample size). Another approach would have been to use an effect
size criterion, such as 1.5 fold increased incidence of an effect. While
this approach should be investigated further, we felt that for risk assessment
purposes, statistical significance (or lack thereof) is a more
accepted norm for differentiating likely effects from no effects. Further
work could address biological significance for both genotoxic and
haematotoxic endpoints, since many of the statistically significant effects
in large groups still fell within normal reference values. The rationale
for considering these as meaningful is that they imply that a
greater subset of the population does have abnormal values, and a
statistically meaningful shif could portend more serious effects. For
genotoxicity we did focus on chromosomal and micronuclei effects,
rather than on SCE and DNA tail effects since the former have been
clearly implicated as predictive factors for cancer (Bonassi et al., 2000,
2007; Bonassi et al., 2008; Murgia et al., 2008).
Another strength of our approach is the use of earlier benzene
health effects than leukaemia mortality. It is increasingly thought that
diseases such as leukaemia represent the adverse outcome at the end of
a series of prior key events (North et al., 2020a). By preventing the key
events in the series, one protects from the final adverse outcome.
Two potential weaknesses to our use of earlier key events are (a)
potential for some additional uncertainty where indirect measurements
of the key event are used and (b) a greater potential to set limits that are
more restrictive than necessary to obtain health protection. While the
target organ for benzene toxicity is bone marrow, all studies sample
peripheral blood, which may leave some uncertainty. We considered it
likely a relatively small uncertainty (a factor of 2) which is also in line
with RAC’s view, because disruption of the high, continuous cellular
productivity of bone marrow should be readily observable in peripheral
blood, and the use of peripheral blood as an indicator of bone marrow
status has widespread and long-standing clinical use. Future research
may establish how well correlated peripheral blood and bone marrow
are in toxicology, but for now it is still addressed through applying an
assessment factor.

Applicant's summary and conclusion

Conclusions:
This study demonstrates that the rich and diverse literature of worker studies on benzene can be assessed using quality scoring methods, allowing conclusions to be based on methodologically-sound studies.
Rejection of direct-acting mutagenicity as a MOA leads logically to consideration of threshold responses, in which haematotoxic and/or genotoxic events represent observable events preceding progression to MDS or leukaemia (AML). Consequently, a health-based OEL based on haematotoxicity and genotoxicity data is proposed.
A full literature review and detailed review of 124 worker studies indicates that each study has shortcomings, and many have multiple shortcomings. However, studies that define both exposure and effects robustly can be used for OEL setting. The study quality scoring process adapted from Vlaanderen et al., 2008 was successfully applied to 36 haematology studies (31 unique populations) and 77 genotoxicity studies (56 unique populations). For both haematotoxicity and r genotoxicity, these best quality studies support effects at 2 ppm (8 h TWA) or higher and no effects at 0.5 ppm (8 h TWA) and lower. Applying an assessment factor of 4 to the
LOAEC of 2 ppm would give a NOAEC of 0.5 ppm based on both haematotoxicity and genetic toxicity. The assessment factor of 4 (2 for dose-response and 2 for intraspecies considerations) is well supported.
The projected NOAEC value of 0.5 ppm is reinforced as being appropriate by the actual NOAECs being very similar (0.5 and 0.59 ppm).
Therefore, an OEL of 0.5 ppm (8 h TWA) could be supported by the data from this objective review. However, the use of peripheral blood measures of bone marrow effects introduces some scientific uncertainty, thus until the issue of bone marrow sensitivity compared to that of peripheral blood is resolved an extra assessment factor of two is
applied.
An OEL of 0.25 ppm (8 h TWA) for benzene is the best estimate based on available human data.
Executive summary:

This paper derives an occupational exposure limit for benzene using quality assessed data. Seventy-seven genotoxicity

and 36 haematotoxicity studies in workers were scored for study quality with an adapted tool based on

that of Vlaanderen et al., 2008 (Environ Health. Perspect. 116 1700−5). These endpoints were selected as they

are the most sensitive and relevant to the proposed mode of action (MOA) and protecting against these will

protect against benzene carcinogenicity.

Lowest and No- Adverse Effect Concentrations (LOAECs and NOAECs) were derived from the highest quality

studies (i.e. those ranked in the top tertile or top half) and further assessed as being “more certain” or “less

certain”. Several sensitivity analyses were conducted to assess whether alternative “high quality” constructs

affected conclusions.

The lowest haematotoxicity LOAECs showed effects near 2 ppm (8 h TWA), and no effects at 0.59 ppm. For

genotoxicity, studies also showed effects near 2 ppm and showed no effects at about 0.69 ppm. Several sensitivity

analyses supported these observations. These data define a benzene LOAEC of 2 ppm (8 h TWA) and a NOAEC of

0.5 ppm (8 h TWA).

Allowing for possible subclinical effects in bone marrow not apparent in studies of peripheral blood endpoints,

an OEL of 0.25 ppm (8 h TWA) is proposed.