Registration Dossier

Toxicological information

Epidemiological data

Currently viewing:

Administrative data

epidemiological data
Type of information:
experimental study
Adequacy of study:
key study
Study period:
1 (reliable without restriction)
Rationale for reliability incl. deficiencies:
test procedure in accordance with generally accepted scientific standards and described in sufficient detail

Data source

Reference Type:

Materials and methods

Study type:
other: Selection of studies reviewed, largely cross-sectional.
Endpoint addressed:
repeated dose toxicity: inhalation
genetic toxicity
other: haematotoxicity
Test guideline
no guideline followed
GLP compliance:

Test material

Constituent 1
Chemical structure
Reference substance name:
EC Number:
EC Name:
Cas Number:
Molecular formula:
Test material form:
liquid: volatile


Type of population:
Ethical approval:
not applicable
Exposure assessment:

Results and discussion

Methods are described that were used to (a) identify appropriate repeated dose haematological and genotoxic studies of benzene exposure and (b) classify such studies with regard to epidemiological quality criteria, (c) identify unique study populations, (d) select the highest quality studies, (e) select Lowest Observed Adverse Effect Concentrations (LOAECs) and No Observed Adverse Effect Concentrations (NOAECs), and (f) classify LOAECS and NOAECs with respect to certainty.
Reviews of the immunotoxicity of benzene (ATSDR, 2007, 2015; ECHA, 2018a, b; IARC, 2019; Veraldi et al., 2006) were initially considered. This literature prompted the observations that:
i)   In well-designed worker studies that examined both haematological and immunological data, immunological effects are not seen at benzene exposure levels below those giving haematological effects e.g. (Lan et al., 2004; Qu et al., 2003)).
ii)   MOA considerations indicated that genetic toxicity and/or haematotoxicity were key events preceding carcinogenic outcomes (North et al., 2020a).
Literature search strategy
The focus of the search strategy was to identify studies on hematotoxicity and genotoxicity in a relevant dose range of interest usingPub Med, Science Direct, TOXNET and EPA databases. Studies before 1980 (hematotoxicity) and 1990 (genotoxicity) at much higher exposures were excluded from the search.
1995 and 2015 and key studies on any aspect of haematotoxicity were gleaned from these early searches. The keywords used were hematox#, myelotox#, lymphatic, blood, autoimmune, immunotox, anemia, cytopenia, hematopo#, epidemiol#, worker#, bimonitor# AND benzene (# denotes wildcard). This strategy identified 286 potential haematology studies published after 1979 for inclusion. This list was supplemented with reference lists from (ATSDR, 2007, 2015; ECHA, 2018a, b; IARC, 2018) which added another 35 studies. Titles of the 321 studies were reviewed. Of these, 206 could be excluded since they were not epidemiological studies. The full text of the remaining 115 studies was then reviewed. 72 studies were eliminated at this stage, which failed the screening criteria for various reasons (did not study benzene per se, pre-1980 study, etc.). This left 43 studies to be fully assessed for quality by the panel of scorers.
The following key words were used to identify relevant genotoxicity studies: genotoxicity OR micronucleus OR chromosome aberration OR DNA damage AND S-phenylmercapturic acid OR blood benzene OR urinary benzene OR benzene. An estimate of 1100 potential studies remained following exclusion of duplicates. 900 studies were identified after elimination of non-epidemiological studies. 200 studies that remained were reviewed. 106 studies were further excluded, which resulted in 94 studies for full quality assessment.
Epidemiological study quality assessment
All studies that were selected from the literature review were put forward for assessment of their epidemiological quality. The scoring scheme from (Vlaanderen et al., 2008) was used as the most appropriate starting point. This was due to its (a) applicability to epidemiological research, (b) applicability to risk assessment of which an OEL derivation is a form thereof, and (c) a sharp focus on exposure assessment (EA), which is often a weak link in these studies.
Two aspects in the (Vlaanderen et al., 2008) guidelines were revised for the present exercise. First, most studies on haematotoxicity and genotoxicity of benzene are cross-sectional studies. Thus, the focus is on recent exposure over days, weeks, or months rather than a longer time span. Many of the studies examined individual exposure, and many also included biomonitoring results. Accordingly, Exposure Assessment criteria were modified to suit studies of a shorter-term focus, so that exposure measurement shortly preceded phlebotomy by weeks to months, for example. Appropriate scoring of exposure categorisation was prioritised, to aid the process of deriving LOAECs and NOAECs.
Key screening criteria are listed below:
1 Study Design (cohort, case control or cross-sectional)
2 Benzene exposure (benzene specifically needed to be assessed)
3 Ratio scale (the index of exposure could not be rankings or other non-quantitative indices)
4 Statistical analysis (the analysis had to generate p-values or confidence intervals)
5 Subject inclusion/exclusion criteria (needed to be specified to evaluate selection/information bias)
6 Outcome measures needed to be defined using recognised norms.
7 Confounding (recognition that other influences on the outcome needed to be assessed)
For the 43 haematological studies put forward for assessment, 36 passed the screening criteria, while 77 of the 94 genotoxicity studies passed the screen.
Once a study passed the screen, the adapted (Vlaanderen et al., 2008), criteria were used to score the studies against seven EA and six other design criteria, which are more fully described in the supplementary material (Appendix 2). EA criteria made up seven of the criteria, with a total possible score of 7. Other aspects of the study accounted for a possible score of 17. Thus, the highest total possible score was 24.
A given study needed to pass all these criteria to be subsequently scored. In cases of uncertainty, the benefit of the doubt was given to pass the study.
Exposure assessment (EA) criteria
The seven EA criteria, each scored as a 0, 0.5, or 1 were as follows:
How robust was collection and analysis of the underlying exposure data - in order to predict a study participant’s exposure?
How well described are the factors resulting in exposure variability in the work setting?
How well is the (current or past) exposure data is assigned to the exposure categories (individuals or similar exposure groups.)?
Is an appropriate exposure metric used? In the case of shorter-term studies on haematotoxicity and genotoxicity, preference is given to metrics that summarise concentrations in the weeks or months immediately preceding biological sampling.
Was an appropriate biomarker was used that results in higher exposure specificity?
(Preference was given to studies that provided both a workplace air measurement and an appropriate, specific biomarker in the body (usually urine) to encompass whole body exposure from all possible routes. Measurements from these two sources were required to support one another with respect to benzene dose. Appendix 3 provides information on appropriate biomarkers of exposure).
Was information on study participants sufficient to assign individuals accurately to exposure groups?
Was specific mention given of whether the exposure assessors were blinded when creating exposure groupings with respect to the disease or outcome measure under evaluation?
Other design criteria
OD-1 STRENGTH OF THE STATISTICAL ANALYSIS  (0–5). (whether (a) multivariate methods were used for best estimation of the effect versus covariates, (b) whether an evaluation of how well the data fit the model was performed, (c) whether the model covered at least three exposure categories, (d) whether the exposure categories were sufficiently accurate to permit a good estimate of a NOAEC or LOAEC, and (e) whether i.e. multiple comparisons were accounted for, if applicable.
Were confounders were appropriately controlled either by restriction, matching, stratification techniques, or regression modelling. Generally, a score of 3 was rarely attained, as it implies all potential confounders were controlled. The important confounders included age, sex, diet, smoking, alcohol use, medications, diet (iron depletion, meat consumption, folic acid/B12), X-rays, recent/current diseases/infections, menstruation/menopause, other diseases, and other exposures.
OD-3 BIAS (0–3).
In general, the two forms of bias that were assessed were selection bias and information bias. Selection bias -of exposed population or controls (e.g. office worker controls versus gas station attendants exposed population would be biased). Information bias occurs if information is measured or classified differently between exposed and control populations. Both biases can occur if controls are different than exposed workers in aspects other than exposure. A score of three is rarely attained; it means that the control population was like the exposed population in every respect, except for benzene exposure, and the study population is a random subset of the larger target population.
If at least one key assumption on exposure, disease, a key confounder or a statistical analysis was made, a study is scored higher, while the presence of multiple sensitivity analyses would result in the highest possible score (2).
To select critical studies for further interpretation, some cognisance of study size and the accuracy of risk estimates is appropriate. The scores of 0, 1 and 2 are defined by the width of the confidence interval and number of study subjects. One should preferably use the width of the confidence interval as the leading criteria, as it incorporates the variability of the response in the population investigated.
OD-6 OUTCOME BLINDING (0–2). This is an important criterion that, if violated, merits study exclusion. Persons’ scoring outcomes should have no knowledge of exposure status. This could be an issue if controls were from a different facility or tested at a different time. If it is clearly documented that laboratory personnel were blinded, the study scored highly. A score of 0 is applicable if there is some doubt as to the blinding, and a study should be excluded if those determining the outcome knew the exposure status of the study participant.
Detailed instructions were developed (Appendix 2) to serve as the basis for study quality scoring.
Consensus scoring
Each study was scored (blinded to each other’s scores) by a core panel of five experts with expertise in toxicology, epidemiology, statistics and exposure assessment. Two other experts served as additional toxicology experts to aid accurate rankings for specific studies. A test set of six genotoxicity studies was randomly selected and scored with an initial set of instructions. Based on feedback, clarity of instructions was improved and a second set of papers scored. Only minimal changes were made and the initial test set of studies rescored.  
Each study was examined for scoring disagreements. A consensus score for each criterion was agreed upon in conferences in which all experts participated. The consensus score for each individual study was recorded in the Supplementary Material.
For studies in the top two tertiles of the quality rankings we wrote comprehensive study summaries which are available in the Supplementary Material.
Defining unique populations
Some different papers reported in different ways upon the same study populations. Where these study populations were found to be overlapping, the study with the lowest LOAEC was used. Where insufficient information was available, study population were assumed to be unique.
Defining high quality studies
Haematotoxicity study scores ranged from 8 to 20, while genotoxicity study scores ranged from 6 to 20.High quality studies were defined by the use of three tertiles. Studies in the top tertile (a) and studies above the median (b) were defined as high quality studies. 
Selection of LOAECs and NOAECs for haematology and genotoxicity studies
After each study was scored for quality, we selected LOAECs and NOAECs for all higher quality studies where sufficient information was available. The LOAEC was selected as the lowest dose group (or, less often, the lowest point on a continuous curve) that showed a statistically significant effect versus the control or, in the absence of an unexposed group, vs. the lowest dose group. We sought to identify the lowest “representative” exposure that defined an effect. An analogous procedure was used to define the NOAEC, where we sought to identify the highest representative exposure that did not show an effect due to benzene. We avoided selecting the very highest value if a range of values that showed no effect was studied. For exposure categories for both LOAEC and NOAEC, arithmetic mean of the exposure was given preference; where possible it was calculated if not present; if the arithmetic mean could not be defined, the category midpoint was used. Where populations were overlapping, we selected the study and population with the lowest LOAEC, among all candidate LOAECs.
Defining more and less certain LOAECs and NOAECs
A LOAEC/NOAEC was designated as “more certain” by default. The following factors were considered when classifying a LOAEC or NOAEC as “less certain”:
A) Lack of monotonicity in the dose-response.
B) Findings only in subgroups. The relationship is seen only for a subgroup (e.g. males, smokers, genetically susceptible, etc), without an underlying biological rationale, and as such it may be more likely due to chance.
C) Isolated findings. Especially for studies that evaluate several different cell types or genetic outcomes, an isolated finding is more likely to occur by chance, or as the result of multiple comparisons, especially if the unique finding is not supported by other studies.
D) Few study subjects. If a LOAEC was based on 10 or fewer, or a NOAEC was based on 20 or fewer subjects, the finding was considered less certain.
E) Differences in ambient exposure and biomarkers of exposure. When results between air exposure and biomarkers disagreed, the biomarker was given preference, provided that the result could be reliably predicted according to the DFG tables (Kraus et al., 2018). If reliable prediction was not possible, the result was assigned an uncertain rating.
F) The presence of co-exposures that could plausibly explain the study findings, leading to candidate LOAECs or NOAECs being disregarded or considered less certain. When summarising data, we gave most interpretive weight to studies that were both of high quality and yielded more certain LOAECs and NOAECs.
For genotoxicity studies, many studies were performed in a mixed exposure environment where other possible genotoxins were undoubtedly present, though few studies properly controlled for genotoxicity elicited by compounds other than benzene (e.g. PAH’s). To take account of this, studies were stratified studies into three groups: (a) factory studies e.g. shoe, luggage, toy or handbag manufacture where benzene was a solvent – these were studies where benzene exposure was experienced indoors, as mixtures with toluene, xylene, and less frequently, ethyl benzene (BTEX). However, these other compounds are not genotoxic. Since benzene exposure is higher, there are generally fewer potentially confounding co-exposures; (b) petrol studies (usually service station attendants) – these were studies where benzene exposure occurred outdoors by inhalation while refuelling, due to its presence in fuels; and (c) traffic/ambient air studies - these are studies where benzene exposure is much lower. Released benzene is present in air in a mixture, often with SO2, NOx, particulate matter (PM), ozone, PAHs, and other compounds. Benzene from cigarette smoke can be higher than the amount received from these exposure scenarios. Thus, control for confounders (often justified through ‘exclusions’) is paramount.
Without simultaneous measurement and statistical analysis of each mixture component, it is difficult to know whether the results from these studies are due to benzene, other co-exposures, a mixture, or confounding factors. Since no study in the “traffic/ambient” scenario presented such a statistical analysis, preference is given to studies from (a), then (b), and then (c) when selecting LOAECs and NOAECs.

Any other information on results incl. tables



Quality scoring results for genotoxic and haematological studies

Among the group of 31 haematology and 56 genotoxicity study populations each had a top score of 20 (of a possible 24), which in both cases was due to the (Qu et al., 2003) study. Both haematotoxicity and genotoxicity studies showed wide ranges (8–20 and 6–20, respectively) indicating marked differences in study quality for each body of literature.

Ties in scores for the haematotoxicity studies resulted in initial stratification of 11 studies in the top tertile (score range 14.5–20), 9 studies in the second tertile (score range 11–14), and 16 studies at or above the median score of 12.5. Similarly, for genotoxicity studies, 21 studies are in the first tertile (score range 13.5–20), 17 studies form the second tertile (score range 11–13), and 29 studies are at or above the median score of 12.5.


LOAECs and NOAECs for high quality studies


Derivation of LOAECs

The highest quality studies (i.e. first tertile) that generated a more certain LOAEC were: Qu et al., 2003 (2.26 ppm, neutrophils), Schnatter et al., 2010, (7.8 ppm, neutrophils), Ward et al., 1996,(7.2 ppm, total leukocytes), Lan et al., 2004, (2.2 ppm, various cell types), Rothman et al., 1996, (7.6 ppm, lymphocytes), and Zhang et al., 2016 (2.1 ppm, leukocytes). From these values, a bimodal distribution results, in which there are two clusters of studies: three studies that suggest a LOAEC near 2 ppm and three studies that suggest a LOAEC near 7−8 ppm.

Sensitivity analyses supported a LOAEC around 2 ppm: Looking at the LOAECs in which all studies at or above the median quality score are considered as high quality, the LOAECs are similar, with only (Bogadi-Šare et al., 2003) at 8 ppm added to the above list from the first tertile. Thus, there are four studies suggesting a LOAEC of 7−8 ppm and three studies suggesting a LOAEC near 2 ppm. This alternative definition of high quality is a sensitivity analysis that supports the top tertile result. For the highest quality (top tertile) studies that generated a less certain LOAEC, values were: (Swaen et al., 2010) (0.75 ppm); and (Koh et al., 2015) (2.6 ppm). Inclusion of these studies is another sensitivity analysis that would lend more weight to a LOAEC in the range of 2 ppm, rather than the second cluster at 7−8 ppm. Various sensitivity analyses incorporating all less certain LOAECs above the median did not change this conclusion.


Derivation of NOAECs

For first tertile studies, the more certain NOAECs are 0.25 ppm (Swaen et al., 2010), 2.9 ppm (Schnatter et al., 2010), 2.2 ppm (Ward et al., 1996), 0.19 ppm (Collins et al., 1991), 0.21 ppm (Koh et al., 2015), and 1.7 ppm (Pesatori et al., 2009). Thus, there are three studies that suggest a NOAEC near 2−3 ppm, and three studies that suggest a NOAEC near 0.2−0.25 ppm. When studies that scored above the median and that show a more certain NOAEC are included, the NOAECs are 0.55 ppm (Collins et al., 1997), 0.81 ppm (Khuder et al., 1999), and 0.33 ppm (Tsai et al., 2004). Collectively, all studies above the median with more definitive NOAECs show four studies near 0.2−0.3 ppm, two studies near 0.6−0.8 ppm, and three studies near 2−3 ppm.

Examining studies above the median is justified and increases the number of studies although quality is somewhat lower (i.e. average quality is 16.25 versus 14.93 for NOAECs, and identical (16.67) for LOAECs).

Based upon frequencies, a LOAEC of 7−8 ppm, and a NOAEC of 2−3 ppm is one defensible conclusion from the analysis above. The NOAECs of 2−3 ppm are of a similar magnitude to three LOAECs from high quality studies, which introduces the problem of overlapping NOAECs with LOAECs. An alternative strategy would be to select the higher quality study(/ies) with more certain LOAECs that do not overlap NOAECs from high quality studies. Thus, there are three studies (Qu et al., 2003; Lan et al., 2004; Zhang et al., 2016) that show LOAECs near 2 ppm. If the LOAEC selected is near 2 ppm, a lower NOAEC should be selected. The two studies with the highest NOAEC yet still below 2 ppm are Collins et 1997 (0.55 ppm) and Khuder et al., 1999 (0.81 ppm). More studies show NOAECs of 0.2−0.3 ppm (Collins et al., 1991; Koh et al., 2015; Swaen et al., 2010; Tsai et al., 2004). Collins et al., 1991; Swaen et al., 2010; Tsai et al., 2004, all studied exposures < 0.5 ppm, so that a NOAEC was not achievable for those studies. Collectively, the results are not in conflict with a 0.5 ppm NOAEC, which is four times lower than the LOAEC (see Table 5). All sensitivity analyses (using top tertile studies, above median studies, and lower certainty LOAECs and NOAECs) in Table 5 result in LOAECs between 1.98 and 2.19 ppm, and NOAECs of 0.58 and 0.59 ppm. Thus, the result based on first tertile studies (a LOAEC of 2.19 ppm and a NOAEC of 0.59 ppm) is a conservative yet coherent interpretation of this information and is the preferred approach or base case.



Factory workers

Of the 21 studies in the top tertile, ten studies were among factory workers, five among fuel handlers and six among workers exposed to traffic and ambient air. In factory workers, the five studies with more certain LOAECs were (Qu et al., 2003) (LOAEC=3.07 ppm), (Xing et al., 2010)(LOAEC>1.6 ppm), (Zhang et al., 2012) (LOAEC>2.64 ppm), (Zhang et al., 2007) (LOAEC=13.6 ppm) and (Zhang et al., 2014) (LOAEC=2 ppm). The top tertile study generating a less certain LOAEC (>0.56 ppm) was (Kim et al., 2004a) due to the presence of PAH co-exposures.


Fuel workers

Three studies (Carere et al., 1995; Pandey et al., 2008 and Rekhadevi et al., 2010) in the top tertile were associated with a more certain LOAEC and none with a less certain LOAEC. The three studies showed similar LOAECs of 2 ppm, 2 ppm, and > 1 ppm, respectively. A NOAEC in the Carere study for micronuclei is 0.47 ppm and in the Pandey study0.9 ppm. The quality scores of the first tertile fuel studies (14.5) are lower than those from the factory

setting (17.25).


Traffic/ambient air

There were only two studies (Leopardi et al., NOAEC=0.003 ppm; Maffei et al., LOAEC=0.008 ppm) in the top tertile which produced a more certain LOAEC or NOAEC. Violante et al. (15.5) has a less certain NOAEC of 0.005 ppm and Angelini (14.5) has a less certain LOAEC of 0.006 ppm. Since the exposure concentrations present in the traffic/ambient air studies are lower than other NOAECs based on fuel and factory studies, this group of studies does not add meaningful information to the NOAEC analysis.

Since the single top tertile study that showed a more certain LOAEC is of lower quality (13.5) than studies from the factory and fuel sectors (average=16.07), this group of studies also does not add meaningful information to the LOAEC analysis. Thus, these studies are not subsequently considered.


Derivation of LOAECs

The highest quality studies (i.e. first tertile) that generated a more certain LOAEC originated from the factory and fuel study scenarios. There were five such studies from the factory scenario: Qu et al. (LOAEC=3.07 ppm), Xing et al. (LOAEC>1.6 ppm), Zhang et al. (2012) (LOAEC>2.64 ppm), Zhang et al., 2007(LOAEC=13.6 ppm), and Zhang 2014 (LOAEC=2 ppm).

Zhang et al., 2007 studied mainly higher exposures, and can therefore be excluded. The four remaining high-quality factory studies result in an average LOAEC of 2.33 ppm. This is the best supported LOAEC (leading case) since it is a weighted average of the highest quality studies, with an average quality score of 17.25. When the three additional studies from the fuel scenario: Carere et al. (2 ppm), Rekhadavi et al. (1 ppm), and Pandey et al. (2 ppm) are added, the resulting LOAEC is 2.04 ppm, which can be regarded as the sensitivity analysis based on the next highest quality studies.

If high quality is defined more inclusively as studies above the median, adding the one additional study from the factory setting with a more certain LOAEC (Eastmond et al., 1.29 ppm) with the other first tertile more certain factory studies, results in an average LOAEC of 2.12 ppm. The average quality score in this sensitivity analysis decreases to 16.3 (from 17.25), but still supports a LOAEC of approximately 2 ppm. There were no additional studies from the fuel nor ambient scenarios which generated more certain LOAECs above the median score of 12.5. All high certainty LOAECs above the median score from the factory and fuel sector combined, result in a LOAEC of 1.95 ppm (average score – 14.85). Although average quality score has decreased, this also supports an aggregate LOAEC of2ppm.

Consideration of the Less certain LOAECs included Kim et al., 2004a, >0.56 ppm, potential confounding by PAH exposure; average LOAEC for all factory studies in the first tertile was 1.97 ppm, quality score of 17.10); Factory studies with a less certain LOAEC (Bogadi-Sare et al., 2003 LOAEC=13 ppm, Holz et al., 1995, LOAEC=0.6–1 ppm). The LOAECs from Bogardi-Sare and Holz differ by more than two orders of magnitude, thus sensitivity analyses are not warranted.

The leading case LOAEC of 2.33 ppm is supported by the leading sensitivity analyses which account for more studies with a lower quality score and suggest slightly lower LOAECs near 2 ppm. Interpreted with due regard to quality, in aggregate the literature supports a LOAEC of 2 ppm.

Derivation of NOAECs. Three studies from the factory scenario that suggest NOAECs: Bogadi-Sare et al. 1997a (8 ppm), Zhang et al., 2011 (4.95 ppm) and Basso et al., 2011

(0.029 ppm). These studies differ by more than two orders of magnitude and as such, do not offer a good “base case” on which to justify a NOAEC. We face the problem of a NOAEC that is higher than the LOAEC. Despite the difficulty in isolating an effect of benzene in impure fuel and (especially) ambient studies, they are the best avenue at present for estimating a NOAEC for genotoxicity. In the fuel scenario, two studies scored in the first tertile and were characterized by more certain NOAECs: Carere et al. (1995) (0.47 ppm) and Pandey et al. (2008) (0.9 ppm). Combining these gives an average NOAEC of 0.69 ppm for genotoxicity. There are three other studies: Fracasso et al. (2010) (0.012 ppm), Pitarque et al. (1996) (0.3 ppm) and Göethel et al. (2014) (0.6 ppm) from the fuel sector that score above the median with more certain NOAECs. Using this set of studies as a sensitivity analysis a NOAEC of 0.45 ppm results. These analyses suggest that a NOAEC of 0.5 ppm is justified.


OEL derivation

Based on haematology studies

Method 1: (Use of the LOAEC)


2.19 ppm (Based on three studies with a more certain LOAEC that are high quality (top tertile quality score). This is a conservative interpretation that does not consider that four other high-quality studies showed LOAECs of 7−8 ppm. 


• Dose-response (LOAEC to NOAEC). 2.19 ppm is the lowest level of exposure among three high quality studies with more certain LOAECs. Most other high-quality studies show a higher LOAEC. In addition, there are other high-quality studies (viz. Schnatter et al., 2010; Ward et al., 1996; Pesatori et al., 2009) (Pesatori et al., 2009; Schnatter et al., 2010; Ward et al., 1996) which report NOAECs for exposure levels similar to 2 ppm. Given this degree of potential overlap in LOAECs and NOAECs and the conservative selection of 2.19 ppm, the factor should be lower than the usual value of 3. A value of 2 is recommended.

• Intraspecies. A factor lower than 3 is recommended when a reasonably large human study is used in which a range of sensitivities are already present and extrapolations from the study data are to other occupational populations. In aggregate, the LOAEC studies considered included >2700 benzene exposed individuals. In addition, it can be seen that the lowest LOAECs ((Qu et al., 2003, (Zhang et al., 2016)) are those based on Chinese workers, who may be a more sensitive population. Thus, a value of 2 is recommended, although a value of 1 would not be unreasonable since the aggregate value is from studies showing the lowest LOAECs and the studies cover diverse populations, already including potentially sensitive sub-populations.

OEL=2.19 ppm / 4 (=2×2)=0.55 ppm METHOD 1


Method 2: (Use of NOAECs)

Method 2 is derived from the NOAECs of four studies of high quality.

NOAECs that are near or above the LOAEC from above are not considered, thus the Schnatter et al., 2010 (Schnatter et al., 2010) study (NOAEC 2.9 ppm) Ward et al( 1996), ( NOAEC 2.2 ppm) are excluded. A NOAEC is usually preferred to a LOAEC, provided that the lack of effect can be observed in a clear and precise way.


NOAECs from four high quality studies (i.e. top tertile) are used as the basis for a weighted NOAEC of 0.58 ppm. These studies are: Collins et al., 1991; Koh et al., 2015; Pesatori et al., 2009; and Swaen et al., 2010. (Collins et al., 1997 and 1991; Khuder et al., 1999; Koh et al., 2015; Pesatori et al., 2009; Swaen et al., 2010; Tsai et al., 2004), in aggregate >11,700 benzene exposed individuals. Three studies (Bogadi-Šare et al., 2003; Schnatter et al., 2010; Ward et al., 1996) that report NOAECs of 2−3 ppm are not included, because they overlap with the chosen LOAEC value. The arithmetic average of these NOAECs is 0.59 ppm


• Dose response. A factor of 1 is suggested because the point of departure is derived from a NOAEC. • Intra-species factor. A factor of 1 is suggested because a larger aggregate human population is used (>11,700 benzene exposed individuals) than in METHOD 1 (>2700 benzene exposed individuals). Given that Method 1 (based on LOAECs) provides an OEL of 0.55 ppm (8 h TWA) and that Method 2 (based on NOAECs) provides an OEL of 0.59 ppm (8 hTWA) both methods would support an OEL of 0.5 ppm (8 h TWA).


The data supporting this position however are derived from worker studies examining effects in peripheral blood, while the target organ for benzene toxicity is bone marrow. An additional factor of two is proposed for possible subclinical effects in the bone marrow. We adopted ECHA RAC’s view that the associated uncertainty is relatively small and that an assessment factor of 2 would be appropriate (ECHA, 2018a, b). Although there is limited scientific experimental information available on this topic, and only in the rodent, French et al. (2015) and Ferris et al (1996) show that the bone marrow may (French et al 2015) or may not (Ferris et al 1996) be more sensitive than peripheral blood. The mouse micronucleus test results however cannot be simply translated to humans because of large differences in splenic function, such as removal of MN erythrocytes (Schlegel and MacGregor, 1983).

For workers, measured events in T-cells afford global assessments of in vivo mutagenicity but are not specific for bone marrow effects, while MN detected in reticulocytes are produced in reticulocyte precursors in the bone marrow (Albertini and Kaden, 2020). In humans only a small difference between MN in peripheral blood lymphocytes and reticulocytes was observed for radioiodine therapy (Stopper et al., 2005).

This available information overall would imply a small assessment factor. Therefore, an additional factor of two is proposed for possible subclinical effects in the bone marrow until additional research clarifies the sensitivity of peripheral blood versus bone marrow effects. This additional factor would support an OEL of 0.25 ppm (8 h TWA).


Based on genotoxicity studies

Method 1: (Use of the LOAEC)


This preferred approach is based on four studies (Table 6) in the factory setting with a more certain LOAEC that are high quality (top tertile). A fifth study (Zhang et al., 2007) which showed a higher LOAEC of 13.6 ppm was not considered. This preferred derivation is supported by additional sensitivity analyses summarized previously which consider the fuel sector as well as the factory sector, and the alternative definition of “high quality” using studies above the median rather than the top tertile.



• Dose-response (LOAEC to NOAEC).>2.33 ppm is the lowest level of exposure among four high quality (top tertile). Subsequently, a NOAEC of 0.69 ppm was calculated (see below). Other NOAECs which were near or greater than the LOAEC were not considered. In addition, the preferred LOAEC is noted as greater than 2.33, thus 2.33 should be regarded as the minimum preferred value. Given the degree of potential overlap in LOAECs and NOAECs, and the fact that there is some uncertainty in the inequality >2.33 ppm, the factor should be lower than the usual value of 3. A value of 2 is recommended. 

• Intraspecies. A factor lower than 3 is recommended when a reasonably large human study is used in which a range of sensitivities are already present and extrapolations from the study data are to other occupational populations. In aggregate, the LOAEC studies considered included >2700 benzene exposed individuals. In addition, all the LOAECs are based on Chinese workers, who may be a more sensitive population. Thus, a value of 2 is recommended. A value of 1 could also be considered since a possibly more sensitive population generates the LOAEC, thus, sensitive sub-populations may have already been accounted for in the selection of this LOAEC.

OEL=2.33 ppm / 4 (=2×2)=0.58 ppm METHOD 1


Method 2: (Use of NOAECs)

Method 2 is derived from the NOAECs of two studies of high quality in the fuel sector since studies in the factory sector showed higher NOAECs when compared to the preferred LOAEC. NOAECs that are

near or above the LOAEC from above are not considered, thus this could be considered a conservative approach.


NOAECs from two high quality studies are used as the basis for a weighted NOAEC of 0.69 ppm. Studies of Zhang et al., 2011 (NOAEC=4.95) and Bogadi-Šare et al., 2003 (NOAEC=8) were not considered, thus the value of 0.69 may be conservative. On the other hand, only two studies are used to calculate the aggregate NOAEC, which could balance the conservative nature of the selection of studies that were included. Concordance with method 1, arguably based on stronger data (average quality score of LOAEC studies=17.25, average quality score of NOAEC studies=14.5) would also justify an intra-species factor of 1.

OEL=0.69 ppm. METHOD 2.


Given that the haematology data suggest an OEL of 0.5 ppm, the genotoxicity based OELs of 0.58 ppm (Method 1), and 0.69 ppm (Method 2) it can be agreed that both datasets would support an OEL of 0.5 ppm (8 h TWA). 

As was the case for haematotoxicity, the data supporting this position are mainly derived from worker studies examining effects in peripheral blood (except for (Xing et al., 2010). An additional factor of two is proposed for possible subclinical effects in the bone marrow until additional research clarifies the sensitivity of peripheral blood versus bone marrow effects. This additional factor would support an OEL of 0.25 ppm (8 h TWA) for both haematotoxicity and genotoxicity endpoints.

Applicant's summary and conclusion

The data presented by Schnatter et al 2020 define a benzene LOAEC of 2 ppm (8 h TWA) and a NOAEC of 0.5 ppm (8 h TWA). However, the use of peripheral blood measures of bone marrow effects introduces some scientific uncertainty, thus until the issue of bone marrow sensitivity compared to that of peripheral blood is resolved an extra assessment factor of two is applied. An OEL of 0.25 ppm (8 h TWA) for benzene is the best estimate based on available human data.
Executive summary:

This paper derives an occupational exposure limit for benzene using quality assessed data. Seventy-seven genotoxicity and thirty six haematotoxicity studies in workers were scored for study quality with an adapted tool based on that of Vlaanderen et al., 2008 (Environ Health. Perspect. 116 1700−5). These endpoints were selected as they are the most sensitive and relevant to the proposed mode of action (MOA) and protecting against these will protect against benzene carcinogenicity. Lowest and No- Adverse Effect Concentrations (LOAECs and NOAECs) were derived from the highest quality studies (i.e. those ranked in the top tertile or top half) and further assessed as being “more certain” or “less certain”. Several sensitivity analyses were conducted to assess whether alternative “high quality” constructs affected conclusions. The lowest haematotoxicity LOAECs showed effects near 2 ppm (8 h TWA), and no effects at 0.59 ppm. For genotoxicity, studies also showed effects near 2 ppm and showed no effects at about 0.69 ppm. Several sensitivity analyses supported these observations. These data define a benzene LOAEC of 2 ppm (8 h TWA) and a NOAEC of 0.5 ppm (8 h TWA). Allowing for possible subclinical effects in bone marrow not apparent in studies of peripheral blood endpoints, an OEL of 0.25 ppm (8 h TWA) is proposed.