Medical Image Analysis - GitHub Pages...notation by positioning a snake around cusp points of each tooth. Wu et al. (2014) proposed to take advantage of the morphological technique
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Medical Image Analysis 69 (2021) 101949
Contents lists available at ScienceDirect
Medical Image Analysis
journal homepage: www.elsevier.com/locate/media
TSegNet: An efficient and accurate tooth segmentation network on 3D
dental model
Zhiming Cui a , d , Changjian Li b , a , Nenglun Chen
a , Guodong Wei a , Runnan Chen
a , Yuanfeng Zhou
c , Dinggang Shen
d , e , f , ∗, Wenping Wang
a , ∗
a Department of Computer Science, The University of Hong Kong, Hong Kong, China b Department of Computer Science, University College London, London, UK c Department of Software Engineering, Shandong University, Jinan, China d School of Biomedical Engineering, ShanghaiTech University, Shanghai, China e Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China f Department of Artificial Intelligence, Korea University, Seoul 02841, Republic of Korea
a r t i c l e i n f o
Article history:
Received 29 June 2020
Revised 6 November 2020
Accepted 12 December 2020
Available online 19 December 2020
Keywords:
Dental model segmentation
Tooth centroid prediction
Confidence-aware cascade segmentation
3D point cloud
a b s t r a c t
Automatic and accurate segmentation of dental models is a fundamental task in computer-aided den-
tistry. Previous methods can achieve satisfactory segmentation results on normal dental models; however,
they fail to robustly handle challenging clinical cases such as dental models with missing, crowding, or
misaligned teeth before orthodontic treatments. In this paper, we propose a novel end-to-end learning-
based method, called TSegNet , for robust and efficient tooth segmentation on 3D scanned point cloud data
of dental models. Our algorithm detects all the teeth using a distance-aware tooth centroid voting scheme
in the first stage, which ensures the accurate localization of tooth objects even with irregular positions
on abnormal dental models. Then, a confidence-aware cascade segmentation module in the second stage
is designed to segment each individual tooth and resolve ambiguities caused by aforementioned chal-
lenging cases. We evaluated our method on a large-scale real-world dataset consisting of dental models
scanned before or after orthodontic treatments. Extensive evaluations, ablation studies and comparisons
demonstrate that our method can generate accurate tooth labels robustly in various challenging cases and
significantly outperforms state-of-the-art approaches by 6.5% of Dice Coefficient, 3.0% of F1 score in term
of accuracy, while achieving 20 times speedup of computational time.
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
Fig. 1. Four typical examples with extreme appearance, including (a) missing teeth, (b) additional braces, (c) blurred boundary signals between incisors and the gum, and
(d) crowding teeth.
(
d
r
v
c
2
b
o
d
d
o
o
a
t
b
t
l
(
(
t
fi
m
m
b
m
(
b
m
w
t
t
o
w
t
t
s
r
a
t
t
a
f
s
p
d
S
t
p
b
o
t
l
a
s
b
t
m
a
o
c
t
2
2
t
3
c
b
b
m
b
a
g
c
t
u
d
n
W
t
m
v
i
u
T
a
c
b
t
Zou et al., 2015 ). However, these methods typically rely on
omain-specific knowledge and lack the robustness required to
epresent intricate tooth shape appearances. Recently, with the ad-
ance in deep learning, more learning-based methods employing
onvolutional neural networks (CNNs) ( Xu et al., 2018; Tian et al.,
019 ) or mesh-based graph neural networks ( Lian et al., 2019 ) have
een proposed with compelling performance. Unfortunately, most
f these methods make a strongly restrictive assumption that the
ental models consist of a complete set of natural teeth, which is
ifficult to be satisfied, for example nearly 70% of the patients in
rthodontic clinics are at the tooth exfoliation time so they they
ften do not have a fixed number of teeth ( Cobourne and DiBi-
se, 2015 ). Mask-MCNet ( Zanjani et al., 2019 ) transforms the den-
al model into point cloud data and uses a volumetric anchor-
ased region proposal network for tooth detection and segmenta-
ion. However, the proposal generation module results in the reso-
ution deduction and requires huge memory resources.
Another line of deep learning methods, including PointNet
Qi et al., 2017a ), PointNet++ ( Qi et al., 2017b ) and PointCNN
Li et al., 2018 ), directly take 3D point cloud data (e.g., mesh ver-
ices) as input and learn deep geometric features to make a classi-
cation and segmentation for general geometric processing tasks. A
ajor limitation of these methods, when applied to our tooth seg-
entation task, is that it is difficult to accurately separate neigh-
oring teeth with similar shape appearances such as incisors, pre-
olars and molars, especially on dental models with missing teeth
Fig. 9 , PointNet++).
To tackle these issues, we present a novel end-to-end learning-
ased method for automatic tooth segmentation on 3D dental
odels. The core of our method is a two-stage neural network
hich firstly detects all the teeth and then segments each detected
ooth accurately. In the tooth detection stage, instead of the tradi-
ional approach that utilizes bounding boxes to crop the detected
bjects ( He et al., 2017; Hou et al., 2019; Zhou and Tuzel, 2018 ),
e exploit the centroid (i.e. the center of mass) of a tooth to iden-
ify each tooth object based on our observation that regardless of
he tooth shape, position and orientation, the centroid point is a
table feature point inside the tooth shape. Therefore it is a more
eliable signal than the bounding box especially when the teeth
re relatively small and packed tightly. In this way, the tooth de-
ection problem is naturally converted to a tooth centroid predic-
ion problem. To predict all the tooth centroids reliably, we design
distance-aware voting scheme that generates the tooth centroids
rom subsampled points with reliable learning local context. In the
econd stage of individual tooth segmentation, we first crop the
oints and corresponding features with the guidance of the pre-
icted tooth centroid, and combine them as one tooth proposal.
ubsequently, all the tooth proposals are sent to the segmenta-
ion module to generate individual tooth labels. Moreover, to im-
rove segmentation accuracy, especially for tooth boundaries with
lurring signals, we introduce a point-wise confidence map based
n a cascade network to enhance the label learning with an at-
ention mechanism. The newly proposed novel components and
oss functions efficiently produce an accurate tooth segmentation
nd boost the usability of our algorithm in the real-world clinical
cenario.
2
Our main contributions are summarized as follows:
• We propose a novel pipeline that formulates the dental model
segmentation as two sub-problems: robust tooth centroids pre-
diction and accurate individual tooth segmentation on point
cloud data. • We design a distance-aware voting scheme to efficiently predict
all tooth centroids. Besides, a confidence-aware attention mech-
anism is introduced to improve segmentation in noisy areas. • Extensive evaluations and ablation studies are conducted on a
dataset collected from dental clinics. Compared with the state-
of-the-art methods, the proposed framework achieves supe-
rior results both qualitatively and quantitatively by a significant
margin.
The rest of the paper is organized as follows. In Section 2 , we
riefly review the existing methods for dental model segmenta-
ion and point cloud learning. Section 3 describes the proposed
ethodology in detail. In Section 4 , we present the quantitative
nd qualitative results of our method and compare with the state-
f-the-art methods. We also discuss the effectiveness of different
omponents of the network and the limitations of our approach in
his section. Section 5 provides the conclusion of our study.
. Related works
.1. Dental model segmentation
Non-learning based methods Many traditional methods based on
he handcrafted geometric features have been proposed to segment
D dental models. These methods can roughly divided into three
ategories: surface curvature based methods, surface contour line
ased methods and harmonic field based methods.
The surface curvature based methods aim to identify the tooth
oundaries. For example, Yuan et al. (2010) calculated the mini-
um curvatures of the teeth surface and extracted the boundary
etween the tooth and soft tissues. Zhao et al. (2006) proposed
n interactive method based on the curvature values of the trian-
le mesh. Kumar et al. (2011) developed a system in which users
an set a certain curvature threshold via an intuitive slider. In addi-
ion, Li et al. (2007) integrated fast marching watersheds and man-
al threshold regulating to improve segmentation accuracy and re-
uce processing time. Kronfeld et al. (2010) minimized user an-
otation by positioning a snake around cusp points of each tooth.
u et al. (2014) proposed to take advantage of the morphological
echnique to facilitate effective tooth separation. However, these
ethods based on the surface curvature are very sensitive to the
ariation of tooth surfaces and appearances.
The methods based on surface contour lines are more reliable
n generating tooth boundaries because the contour lines are man-
ally annotated. Specifically, these methods ( Sinthanayothin and
haranont, 2008; Yaqi and Zhongke, 2010 ) allowed users to manu-
lly select tooth boundary landmarks on a dental model. Then, the
ontour lines computed from the geodesic information of neigh-
oring landmarks are formed as the desired tooth boundaries. Al-
hough achieving the good performance, these methods require
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
u
l
c
l
p
c
s
i
w
m
Y
i
t
K
t
(
v
s
s
w
l
i
m
i
m
v
L
s
t
r
p
c
c
M
t
a
2
o
f
fi
b
Q
u
f
2
p
c
i
m
m
s
3
m
t
t
c
t
c
t
t
c
d
(
3
t
t
s
w
p
c
a
p
fi
t
t
n
t
s
p
a
a
g
t
i
f
s
p
t
t
s
s
s
(
w
p
f
o
e
t
T
t
p
t
p
f
e
a
s
3
o
L
w
L
sers to translate or rotate the 3D model multiple times to se-
ect the particular landmarks carefully, which is tedious and time-
onsuming.
As for the third category ( Zou et al., 2015 ), the framework al-
owed users to annotate a limited number of surface points as
riors and employed a harmonic field to segment the tooth suc-
essfully. Compared to other interactive methods, this method pre-
ented a more efficient and smarter user interfaces with minimum
nteractions. However, the input models is assumed to manifold,
hich requires a sophisticated preprocessing step.
Another group of methods that aimed to effectively seg-
ent 3D dental models are based on 2D images. For example,
amany and El-Bialy (1999) built a 2D image representation us-
ng the curvature and surface normal information, and extracted
he structures of high/low curvatures as the segmentation results.
ondo et al. (2004) proposed to detect the tooth features both on
he plane-view and panoramic-view images. Similarly, some works
Wongwaen and Sinthanayothin, 2010; Grzegorzek et al., 2010 ) de-
eloped systems to find the contour or cutting points on the 2D
ectional images and then converted it back to the 3D space for
eparating individual tooth. Unfortunately, these methods often fail
hen dental models have severe malocclusion.
Learning based methods Recently, with the development of deep
earning techniques, many studies leverage neural networks on 2D
mages, meshes and point clouds to extract teeth from a dental
odel. Specifically, Xu et al. (2018) used a 2D CNN to classify the
mage produced from the pre-defined handcrafted features of each
esh face. Tian et al. (2019) employed a 3D CNN and a sparse
oxel octree for tooth segmentation. In addition, Lian et al., 2020;
ian, Wang, Wu, Liu, Durán, Ko, Shen ; Sun et al. (2020) integrated a
eries of graph-constrained learning modules to hierarchically ex-
ract multi-scale contextual features for automatically labeling on
aw dental surface. However, since these methods typically group
oints or faces into pre-defined clusters, they usually fail to pro-
ess the data with missing teeth, which is common for real-world
linical scenarios. In addition, Zanjani et al. (2019) extended the
ask R-CNN ( He et al., 2017 ) to a 3D point cloud extracted from
he dental model, it suffered from low efficiency and segmentation
rtifacts.
.2. 3D point cloud learning
3D understanding is an essential task in computer vision. State-
f-the-art methods take as input all kinds of 3D data to per-
orm tasks such as 3D shape segmentation, detection and classi-
cation. Among the input data, 3D point cloud representation is
ecoming more popular since it is flexible and memory efficient.
i et al. (2017a) designed a novel network to take as input an
nstructured point cloud and learn translation-invariant geometric
eatures. Some state-of-the-art methods ( Qi et al., 2017b; Li et al.,
018; Wu et al., 2019 ) improved the framework by recursively ap-
lying neural networks on a nested partitioning of the input point
loud, which had the ability to learn local features with increas-
ng contextual scales and achieved state-of-the-art performance on
any segmentation and classification tasks. However, their perfor-
ance is limited in our specific task, because the tooth is very
mall compared with the whole dental model.
. Methods
In this section, we present a novel framework for tooth seg-
entation on 3D dental model. As shown in Fig. 2 , our approach
akes as input the 3D point cloud extracted from the input den-
al model, and aims to assign every point a unique label. Specifi-
ally, we first introduce the distance-aware tooth centroid predic-
ion module that generates a set of candidate points for the tooth
3
entroids ( Section 3.1 ). Then, we propose a confidence-aware at-
ention mechanism to segment each tooth guided by the predicted
ooth centroid ( Section 3.2 ). At the testing stage, we utilize a tooth
entroid clustering algorithm to speed up the segmentation, and
irectly transfer the point cloud labels back to the dental model
Section 3.3 ).
.1. Distance-aware tooth centroid prediction
To identify a tooth object properly, we formulate it as the
ooth centroid prediction problem. Formally, given an input den-
al model, we first extract the mesh vertices and uniformly down-
ample it to obtain the input point cloud P with dimension N × 6 ,
here N = 16 , 0 0 0 is the number of sampled input points and each
oint is described by a 6- D vector. Specifically, other than the 3D
oordinates ( 3-dims ), we also acquire the normal vector ( 3-dims )
t each point from the dental mesh as an additional feature to
rovide auxiliary information. Having the input point cloud P, we
rst normalize it within a unit ball, and extract the geometric fea-
ures utilizing PointNet++ as the backbone encoder, which includes
hree blocks of multi-layer perceptrons (MLPs) followed by a batch
ormalization layer and a ReLU nonlinearity layer. The output of
he backbone encoder is a set of subsampled points F with dimen-
ion M × (3 + 256) , where M = 256 is the number of subsampled
oints. For each point, in addition to the 3D coordinates, there are
nother 256- D features encoding the local contextual information
round it.
For the dental model of an upper or lower jaw, we have the
round truth tooth centroid set C = { c 1 , c 2 , . . . , c k } , and the goal is
o predict all tooth centroids from the subsampled points F us-
ng the learned local features. Therefore, we design a displacement
unction to learn the offset of each subsampled point to its corre-
ponding tooth centroid c i . This is feasible because if a subsampled
oint appears around a tooth, the encoded features capturing the
ooth shape have the ability to predict the centroid of the nearby
ooth. Specifically, as shown in Fig. 2 , the MLPs take as input the
ubsampled points F with the learned local features to output a
et of M displacement vectors �C = { (�x i , �y i , �z i ) } . Finally, the
et of regressed centroid points, ˆ C = { (x i + �x i , y i + �y i , z i + �z i ) } i ∈ [1 , M] ), are generated to approximate the ground truth set C,
here (x i , y i , z i ) denotes the 3D coordinate of the i th subsampled
oint F i .
However, since the subsampled points F are uniformly sampled
rom the input point cloud by the farthest sampling operation, we
bserve that some subsampled points may be far from any tooth,
.g., on the dental palate ( Fig. 4 (a)), which encode little informa-
ion of any tooth object and cannot predict reliable tooth centroids.
o filter such points automatically, we exploit another distance es-
imation branch to regress a distance value for each subsampled
oint, measuring the closeness of the point to its nearest ground
ruth tooth centroid, as shown in Fig. 2 .
To train the network, we propose three novel loss terms to su-
ervise the prediction of all tooth centroids.
Distance estimation To remove the subsampled points that are
ar way from any tooth, we first measure the distance between
ach subsampled point and its nearest tooth centroid, and set it
s the ground truth of the distance estimation. Then we utilize
mooth L1 loss to calculate the regression error. Let F (3) denote the
D coordinates of the subsampled points F . Then the loss function
f the distance estimation is defined as follows:
D =
∑
f (3) i
∈ F (3)
L smooth 1 ( ̂ d i − min
c k ∈ C
∥∥ f (3) i
− c k ∥∥
2 ) , (1)
here
smooth 1 (x ) =
{0 . 5 x 2 if | x | < 1
| x | − 0 . 5 otherwise , (2)
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
Fig. 2. The two-stage network architecture and the algorithm pipeline. The dental mesh is first fed into the centroid prediction network in stage one, then the cropped
features based on the regressed points go through the tooth segmentation network in stage two. Finally, we derive the accurately segmented tooth objects. The numbers 1,
2, 3 in the proposal box, represent the input signals for the segmentation network, i.e., cropped coordinate feature, propagated point feature and dense distance field feature
respectively. See algorithm details in Section 3 .
Fig. 3. The ISO standard tooth numbering system and the corresponding color cod-
ing.
w
p
m
t
a
p
s
t
g
f
c
i
c
I
t
L
L
w
t
l
t
e
w
c
C
l
L
w
t
T
p
t
p
f
L
here ˆ d i refers to the predicted distance value from the subsam-
led point F i to its nearest tooth centroid. With this distance esti-
ation module in the framework, we filter the subsampled points
hat have a relative large predicted distance both at the training
nd testing stage. The threshold α is set to 0.2 on the normalized
oint sets, which is consistent with the receptive field of the last
et abstraction layer in the encoder.
Chamfer distance In the tooth centroid prediction branch, we
rain the network by minimizing the distance between the re-
4
ressed centroid set ˆ C and ground truth centroid set C, which is
ormulated to consider the following two factors: (1) every tooth
entroid in C should correspond to at least one regressed centroid
n
ˆ C (surjection function); (2) every regressed centroid in
ˆ C should
orrespond to exactly one tooth centroid in C (injection function).
t is a bidirectional distance minimization and we use Chamfer dis-
ance to supervise the tooth centroid prediction. The loss function
CD of the two sets of centroids is formulated as:
CD =
∑
ˆ c i ∈ ̂ C , ̂ d i <α
min
c k ∈ C || ̂ c i − c k || 2 2 +
∑
c k ∈ C min
ˆ c i ∈ ̂ C , ̂ d i <α|| c k − ˆ c i || 2 2 , (3)
here α = 0 . 2 is introduced in the distance estimation term.
Separation loss The tooth centroid prediction with distance es-
imation and chamfer distance supervision already achieves excel-
ent performance. But we still observe that a few predicted cen-
roids are located near the boundary between two adjacent teeth,
specially for the incisors of the lower jaw as shown in Fig. 5 ,
hich are relatively small and closely packed. This happens be-
ause these ambiguous centroids receive little penalization from
hamfer distance loss. To tackle this issue, we add a separation
oss, defined as:
S =
∑
ˆ c i ∈ ̂ C , ̂ d i <α
�d 1 �d 2
, (4)
here �d 1 and �d 2 are the distances of the predicted tooth cen-
roid ˆ c i to its first and second closest centroids in C, respectively.
his term encourages each predicted centroid to be as close as
ossible to a correctly corresponding tooth centroid in the ground
ruth set C.
Finally, our training loss function L cp for the robust centroid
oint prediction is obtained by combining the three loss terms as
ollows:
cp = L D + L CD + βL S , (5)
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
W
a
3
t
m
o
i
t
i
d
a
c
r
n
s
l
p
d
w
3
d
c
c
m
a
g
w
f
b
A
c
f
n
(
m
T
t
t
a
c
t
d
o
s
L
w
l
a
b
m
p
C
h
c
s
a
L
w
a
p
s
t
t
L
3
c
d
t
t
d
r
c
a
m
o
f
f
o
a
a
A
d
w
n
l
u
c
i
4
f
t
3
t
f
m
F
a
4
s
c
here β is the balancing weight and is empirically set to 0.1 for
ll experiments.
.2. Confidence-aware tooth segmentation
Now we discuss how to use the accurately predicted tooth cen-
roids as the guidance information to perform individual tooth seg-
entation.
Tooth proposal generation Since each tooth is found by at least
ne predicted centroid, we first generate tooth proposals accord-
ng to the predicted centroids. Instead of utilizing a bounding box
o crop a tooth object, we crop the nearest n = 4096 points in the
nput point cloud data based on the Euclidean distance to the pre-
icted tooth centroid, which are roughly a quarter of the points of
input dental model (16,0 0 0) and ensures a complete tooth is in-
luded in the proposal. As highlighted in the top row of Fig. 2 with
ed dotted box, a tooth proposal is represented by three compo-
ents. The first is the cropped points coordinates ( 3-dims ), and the
econd is the cropped points propagation features ( 32-dims ). The
ast component is a dense distance field df (i ) ( 1-dim ) for the i − th
roposal, defined as:
f j
(i ) = exp
(−4
∥∥ ˆ c i − ˆ p j (i )
∥∥2
), (6)
here ˆ c i is the predicted centroid of proposal i, while ˆ p j (i )
is the
D coordinate of point j in the cropped points. By proposing the
istance field, the foreground tooth corresponding to the predicted
entroid will have a higher value compared to other teeth in the
ropped points, which is regarded as a guidance map for the seg-
entation sub-network.
At last, we directly concatenate the three individual features
nd feed them into the segmentation network to segment the fore-
round tooth shape.
Confidence-aware cascaded segmentation The segmentation net-
ork building upon PointNet++, takes as input the concatenated
eature of dimension n × (3 + 32 + 1) and outputs the binary la-
el of each point belonging to the tooth shape or the background.
lthough PointNet++ demonstrates excellent performance in point
loud segmentation, it is hard to separate the tooth shape clearly
rom the surrounding gum due to the blurred geometric signals
ear the tooth boundary and large variations of tooth shapes
Fig. 6 ). Thus we first design our network using a cascaded seg-
entation scheme with two segmentation sub-networks S 1 and S 2 .
he cascaded scheme that S 2 takes as input both the proposal fea-
ures and the 1-dimensional segmentation result from S 1 . In addi-
ion, to further improve segmentation accuracy near the bound-
ry of complicated tooth shapes ( Fig. 7 (a)), we propose a novel
onfidence-aware attention mechanism for tooth segmentation and
he details are given below.
In the first segmentation sub-network S 1 , in addition to pre-
icting the segmentation results of the proposals, we introduce an-
ther branch to estimate the point-wise confidence value λ, mea-
uring the accuracy of the segmentation, defined as:
S 1 =
1
n
n ∑
j
(L
bce S 1 j
× λ j ) 2 +
(1 − λ j
)2 , (7)
here L
bce S 1 j
indicates the point-wise binary cross-entropy (BCE)
oss between the predicted point label and the ground truth label
nd λ is trained in an unsupervised manner to measure the am-
iguity of the predicted label. That is, the higher is the value, the
ore accurate is the prediction results. A visual illustration of the
oint-wise confidence map is presented in Fig. 2 (the weight map).
learly, the boundary area with blurred geometric signals tends to
ave lower confidence value.
5
In the second segmentation sub-network S 2 , we convert the
onfidence map into a normalized weight map that emphasizes the
egmentation of the area with lower λ in S 2 , e.g., the boundary
rea. The training loss is:
S 2 =
1
n
n ∑
j
(1 . 0 + W j ) × L
bce S 2 j
, (8)
here W j = 1 . 0 − λ j is a point-wise value on the weighted map
nd L
bce S 2 j
refers to the point-wise BCE loss in S 2 .
In addition, to identify the foreground tooth ID in each pro-
osal, we utilize the global feature extracted in S 2 to make a clas-
ification and calculate the cross entropy loss L ID to supervise the
ask. Finally, we train the cascaded segmentation network using
he loss function:
seg = L S 1 + L S 2 + L ID . (9)
.3. Centroid clustering and label prediction
In the previous step, the predicted tooth centroids exhibit the
lustering tendency as shown in Figs. 4, 5, 8 . To remove redun-
ant tooth centroids and speed up processing, in both training and
esting phases, we first apply the DBSCAN ( Ester et al., 1996 ) clus-
ering algorithm to all the predicted centroids controlled by the
istance threshold l . Here, l is empirically set to 0.015, which is
elatively small compared to the tooth size in the normalized point
loud data. For every cluster, we calculate the representative aver-
ge centroid point and derive the corresponding proposal for seg-
entation.
During the testing phase, after the individual tooth extraction
n the generated proposals, the next step is to produce the labels
or the input point cloud data. To this end, we first calculate the
oreground point overlap of each two proposals. If the Intersection
ver Union (IoU) is higher than the threshold 0.35, the two propos-
ls are regarded to contain the same tooth. In this case, we aver-
ge the point-wise label probability to fuse the overlapped points.
t last, the point cloud labels are directly transferred back to the
ental surface based on the trilinear interpolation.
In implementation, we first train the centroid prediction net-
ork for 500 epochs, then connect the single tooth segmentation
etwork and jointly train the framework for 100 epochs. We uti-
ize Adam’s solver with a fixed learning rate of 1 × 10 −3 . Generally,
sing one Nvidia GeForce 1080Ti GPU, it takes about 4 h for the
entroid prediction network training and 18 h for the joint train-
ng.
. Experiments and results
In this section, we evaluate our algorithm on a dataset collected
rom the real-word clinics, including upper and lower jaws. The
ooth identification is based on the dental notation system (ISO-
950) ( Grace, 20 0 0 ) (as shown in Fig. 3 ), which is consistent with
he color coding of our segmentation results. The teeth subgroups
or evaluation purpose in this section, i.e., incisor, canine, premolar,
olar (in Tables 2 and 3 ), are set according to the types marked in
ig. 3 as well. All experiments are performed on a computer with
Intel(R) Xeon(R) V4 1.9 GHz CPU, a 1080Ti GPU, and 32 GB RAM.
.1. Dataset and evaluation metrics
To train the network, we collected a set of dental models from
ome patients before or after orthodontics, which include many
ases with abnormal tooth shapes, such as crowded teeth, missing
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
Fig. 4. The qualitative comparison of the centroid prediction results with (b) or
without (a) the robust filter. The first row shows centroid point prediction results
with paired purple and red points indicating the start and end positions, while
the second row shows the corresponding segmentation results using bNet seg . The
wrongly predicted points lead to incorrect tooth segmentation (red color). (For in-
terpretation of the references to color in this figure legend, the reader is referred to
the web version of this article.)
Fig. 5. The qualitative comparison of the centroid points prediction. The first row
shows the predicted centroid points, while the second row shows the segmentation
results using bNet seg . Without the separation loss, bCNet cp -D-CD outputs ambiguous
points that cheat the clustering algorithm in the testing stage to miss some teeth,
as highlighted in the dotted boxes.
Fig. 6. The qualitative comparison of tooth segmentation with (b) or without (a)
the cascaded refinement. With the refinement module, bNet seg -C generates results
without artifacts in the tooth body part.
Fig. 7. The qualitative comparison of tooth segmentation with (b) or without (a)
the confidence-aware refinement. The tooth boundaries are highlighted in red boxes
and TSegNet generates a more accurate boundary. (For interpretation of the refer-
ences to color in this figure legend, the reader is referred to the web version of this
article.)
t
d
d
T
m
t
t
o
o
(
p
M
M
w
c
t
t
(
r
D
D
w
c
t
a
i
q
u
4
o
b
u
b
n
l
l
o
6
eeth and additional braces. The dataset includes a total of 20 0 0
ental models (10 0 0 upper jaws and 10 0 0 lower jaws), where each
ental surface contains about 150,0 0 0 faces and 80,0 0 0 vertices.
o train the network, we randomly split it into three subsets, 1500
odels for training, 100 models for validating and 400 models for
esting. To obtain the ground truth, we manually annotated the
ooth-level label, and the centroid of each tooth is calculated based
n the labeled mask. To quantitatively evaluate the performance of
ur method, we use the mean distance (MeanD) and max distance
MaxD) metrics to validate the performance of the tooth centroid
rediction, defined as:
eanD (R
1 , R
2 ) =
1
| R
1 | ∑
r 1 i ∈ R 1
min
r 2 j ∈ R 2
|| r 1 i − r 2 j || 2 2 , (10)
axD (R
1 , R
2 ) = max r 1
i ∈ R 1
min
r 2 j ∈ R 2
|| r 1 i − r 2 j || 2 2 , (11)
here R 1 and R 2 represent two point sets. The two metrics are
omputed by the predicted tooth centroids set and the ground
ruth tooth centroids set in a bidirectional manner ( Table 1 ). For
he segmentation task, we utilize the dice similarity coefficient
DSC) metric to validate on the point cloud and the dental surface
espectively, that are calculated as:
SC point = 2 × | L GT ∩ L P | | L GT + L P | , (12)
SC sur face = 2 × Area ( L GT ∩ L P )
Area ( L GT ) + Area ( L P ) , (13)
here L GT and L P denote the ground truth tooth labels, and the
orresponding predicted labels, respectively. Note that the DSC of
he dental surface is calculated in a manner weighted by the face
rea. Besides, macro F1-score (F1) is used to measure the tooth
dentification accuracy ( Opitz and Burst, 2019 ). In the following
uantitative results, other than Fig. 10 , we report the average val-
es computed on the testing subset.
.2. Ablation analysis of key components
We conduct extensive experiments to validate the effectiveness
f our network components and loss functions. First, we build
aseline networks for both tooth centroid prediction and individ-
al tooth segmentation tasks, which are denoted as bNet cp and
Net seg , respectively. For the tooth centroid prediction baseline
etwork, we directly supervise all the subsampled points to trans-
ate to their nearest tooth centroids, while the segmentation base-
ine network is the single PointNet++ segmentation module with-
ut the confidence-aware cascade mechanism. All the alternative
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
Fig. 8. Representative segmentation results. From left to right: input, predicted centroid points, tooth segmentation on the point cloud, tooth segmentation on dental models
with two different views. The accurate segmentation boundary is highlighted in the boxes.
Fig. 9. The visual comparison of dental model segmentation results produced by different methods, with each row corresponding to a typical example of the upper or lower
jaw. From left to right are the scanned dental surface, the ground truth result, results of other methods (3rd-5th columns) and result of our method (last column). Red
dotted circles and arrows represent some segmentation details. ‘ † ’ denotes the method is a semi-automatic method. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)
n
d
s
t
t
m
t
n
p
i
t
m
C
m
t
t
r
i
s
d
p
t
o
s
etworks are derived by augmenting the baseline network with
ifferent network com ponents or loss terms, and are trained on the
ame training dataset. We describe the details and present quanti-
ative and qualitative results in the following section.
Benefits of distance estimation The distance estimator in the
ooth centroid prediction module serves as a robust filter to re-
ove the subsampled points that are far away from their nearest
ooth centroids. To validate its benefits, we augment the baseline
etwork bNet cp with the distance-aware filter (bNet cp -D) and com-
are tooth centroid prediction results of both networks as shown
n Table 1 and Fig. 4 . Quantitatively, bNet cp -D consistently improve
ooth centroid prediction results of all metrics. Specifically, the
ax distances MaxD ( ̂ C , C) from the predicted tooth centroids set
7
ˆ to the ground truth tooth centroids set C are reduced by a large
argin ( 86 . 245 × 10 −3 vs. 60 . 065 × 10 −3 ), which demonstrates that
he filtered subsampled points encode little tooth shape informa-
ion and usually produce unreliable tooth centroid predictions. Cor-
espondingly, the segmentation performance is improved accord-
ngly ( 1 . 2% and 1 . 3% improvements of DSC point and DSC sur face , re-
pectively).
In addition, to analyze the effectiveness of the proposed
istance-aware filter more comprehensively, we visualize the dis-
lacement vectors in the point cloud (the first row of Fig. 4 ) and
heir corresponding dental model segmentation results (the sec-
nd row of Fig. 4 ). On the one hand, we have efficiently filtered
ample points that are far away from any centroid points and less
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
Table 1
Statistical performance of the tooth centroid prediction and segmentation with different tooth centroid prediction loss terms. Here,
the distance metric is calculated on the point cloud data normalized within a unit ball, i.e., the ratio referencing to the unit length.
Fig. 10. The segmentation performance of different methods on normal and abnormal cases. (a) DSC accuracy on the point cloud; (b) DSC accuracy on the dental surface.
l
i
h
t
g
p
i
p
C
C
M
9
t
n
D
r
l
c
t
a
w
a
t
t
c
C
m
a
p
c
p
r
t
a
t
f
b
s
r
t
g
p
u
l
c
p
(
s
T
b
a
p
i
b
h
a
c
t
t
w
p
c
g
m
m
ikely to find an optimal position. Usually, these points will result
n wrongly regressed centroid points and segmentation results, as
ighlighted using the red color in Fig. 4 . On the other hand, with
he learned filter, the predicted points tend to lie close to the tar-
et points, which demonstrates the clustering effect benefiting the
roposal generation.
Chamfer distance loss To supervise the tooth centroid prediction,
nstead of using the intuitive way that directly forces the subsam-
led point to move to its nearest tooth centroid, we utilize the
hamfer distance to calculate bidirectionally distances (bNet cp -D-
D). Compared to bNet cp -D, the mean distances MeanD (C, ˆ C ) and
eanD ( ̂ C , C) are reduced from 2 . 998 × 10 −3 to 2 . 673 × 10 −3 and
. 350 × 10 −3 to 6 . 857 × 10 −3 , demonstrating that the Chamfer dis-
ance is the key to the success of our tooth detection compo-
ent. And not surprisingly, it improves the segmentation accuracy
SC point from 95 . 5% to 95 . 9% and DSC sur face from 96 . 5% to 96 . 8% ,
espectively.
Separation loss To validate the effectiveness of the separation
oss in the accurate tooth centroids prediction, especially for in-
isors that are crowding and packing together, we explore the al-
ernative loss combination by augmenting the bNet cp -C-CD with
separation loss, that is denoted as bNet cp -C-CD-S. Statistically,
ith the separation loss, the MaxD ( ̂ C , C) gains about a remark-
bly 28 × 10 −3 reducing, as shown in Table 1 . Although only lit-
le improvement is achieved in the other three metrics, the impor-
ance of the separation loss is presented more clearly in the visual
omparison in Fig. 5 . For the lower jaw dental model, bNet cp -C-
D-S successfully predicts all correct centroids, while bNet cp -C-CD
isses one incisor since the predicted centroids around incisors
re clustered into one group so as to miss one proposal in the
roposal generation stage. Considering the small crown area of in-
isors, it contributes slightly to the segmentation metrics, 0 . 2% im-
rovement of DSC point and 0 . 1% improvement of DSC sur face .
It is interesting to find that the distance metrics along two di-
ections differ a lot. The reason is that under the supervision of
8
he centroid detection losses, every ground truth centroid receives
t least one regressed centroid, and most of them are close enough
o the target, but a few regressed centroids are a little bit far away
rom the nearest tooth centroid. Thus, from
ˆ C to C, the errors are
igger. The bidirectional constraints reveal that the two metrics
hare equal importance. Higher error of ( ̂ C , C) indicates that some
egressed centroids are far away from any tooth, which would lead
o over-detection; while higher error of (C, ˆ C ) implies that some
round truth centroids are not detected by any regressed centroid
oint, which usually leads to miss-detection.
Cascaded segmentation refinement The cascading mechanism
sually has a beneficial influence on the image segmentation prob-
em. To validate the efficacy of the cascading scheme in our spe-
ific task, we first use the bNet cp -C-CD-S as the tooth centroid
rediction network and augment the base segmentation network
bNet seg ) with another sub-module to refine the preliminary re-
ults (denoted as bNet seg -C). The quantitative results are listed in
able 2 for comparison. It can be seen that the cascading network
Net seg -C consistently improves the segmentation performance of
ll four teeth types with higher DSC point and DSC sur face values, es-
ecially for the tooth with irregular shape. One typical example
n Fig. 6 presents the visual comparison. As highlighted in brown
oxes, the special case with extreme appearance can be robustly
andled by the network bNet seg -C. In addition, the cascade scheme
lso brings 0 . 9% F1 score improvement for the tooth identification
oming from the correctly predicted labels.
Confidence-aware segmentation refinement To validate the effec-
iveness of the confidence-aware cascading mechanism, we fur-
her augment bNet seg -C with the confidence map as our final net-
ork (TSegNet), which encourages the segmentation network to
ay more attention to the area with relative low confidence, espe-
ially near the tooth boundary and ambiguous regions with blurred
eometric signals. As illustrated in Table 2 , compared to the com-
on cascade mechanism (bNet seg -C), the confidence-aware seg-
entation network improves the average DSC point and DSC sur face
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
Fig. 11. Segmentation results of dental models with complex appearances, including teeth missing, crowding and irregular shapes highlighted by red boxes. Four attention
maps of abnormal cases in the last row are also presented, and the red color indicates higher segmentation confidence while the blue color indicates lower segmentation
confidence. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Table 2
Numerical performance of segmentation accuracy for different segmentation network variants. The F1 scores are also included.
Methods DSC point [%] DSC sur face [%] F1[%]
Incisor Canine Premolar Molar Mean Incisor Canine Premolar Molar Mean
ith 0.6% and 0.8% rising, respectively. The qualitative results in
ig. 7 also show that TSegNet can produce more reliable segmenta-
ion results without artifacts. More representative and challenging
egmentation results of TSegNet are presented in Figs. 8 and 11 .
.3. Comparison with state-of-the-art methods
We compare our framework with several state-of-the-art
oint or mesh segmentation approaches, including PointNet++
Qi et al., 2017b ), harmonic field ( Zou et al., 2015 ) and Mask-
CNet ( Zanjani et al., 2019 ). The first one directly takes the
D point cloud as input and achieves the state-of-the-art perfor-
ance in many public segmentation datasets. The last two are
pecialized methods for dental model segmentation. Specifically,
ou et al. (2015) presents a semi-automatic method based on ge-
metric surface features and outperforms other traditional meth-
ds. Zanjani et al. (2019) extends Mask-RCNN to 3D point clouds
nd achieves the leading performance in automatic dental model
egmentation. For a fair comparison, we train PointNet++, Mask-
CNet and our method with the same point cloud input (i.e., co-
rdinates and normals). The statistic and visual comparisons are
hown in Table 3 and Fig. 9 , respectively.
9
Quantitative comparison The overall tooth segmentation, iden-
ification and running time results are summarised in Table 3 ,
here our framework significantly outperforms other state-of-
he-art methods by a large margin. Concretely, compared with
he backbone network PointNet++, our method leads to 11 . 9% ,
0 . 9% and 6 . 8% improvements of DSC point , DSC sur face and F1 score,
hich demonstrates the effectiveness of the network architec-
ure and loss design. Moreover, our framework inherits the effi-
iency of PointNet++ as it shows comparable running time (0.8 s
s. 0.3 s). Although ( Zou et al., 2015 ) proposed a semi-automatic
ethod that employs harmonic field of the crown surface and
igh-level semantic information manually provided by users, our
ull-automatic framework still outperforms it in terms of segmen-
ation accuracy and running time. Note that harmonic field based
ethod is unable to predict tooth identification automatically. At
ast, it is observed that our approach achieves better results than
ask-MCNet, that is a state-of-the-art learning based method in
his specific task. In particular, our method boosts the segmen-
ation accuracy from 91 . 5% to 98 . 0% ( DSC point ), 92 . 5% to 98 . 6%
DSC sur face ), and F1 score of tooth identification from 91 . 2% to
4 . 2% . In the meanwhile, because Mask-MCNet is an anchor-based
ethod that has to crop the dental model into several patches,
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
o
f
I
7
p
t
h
n
a
o
2
2
t
w
d
o
o
D
H
r
a
p
a
c
t
i
m
t
b
P
t
t
m
b
s
t
p
F
t
t
m
4
e
M
m
d
t
m
t
a
t
e
m
s
s
m
t
r
e
t
i
i
c
p
p
4
s
t
N
t
g
w
d
w
s
p
fi
w
c
t
f
s
o
c
t
s
o
d
w
o
r
m
b
r
m
e
t
c
a
t
T
a
b
m
b
g
d
m
F
i
F
i
l
n
s
t
w
t
p
s
o
c
ur anchor-free method is more efficient and nearly 25 times
aster.
We also quantify segmentation results per tooth type in Table 3 .
t can be seen that PointNet++ only obtains 73 . 0% DSC point and
6 . 5% DSC sur face for premolar teeth, which is much lower com-
ared to other types. The reason is that most patients seeking or-
hodontic treatment are in the tooth exfoliation period, and usually
ave unfixed number of premolars. In addition, young children do
ot have premolars because these teeth do not grow until they are
round 10 years old. Thus, these clustering-based learning meth-
ds, such as PointNet++ ( Qi et al., 2017b ), MeshSegNet ( Lian et al.,
020; Lian, Wang, Wu, Liu, Durán, Ko, Shen ) and TGCNN ( Xu et al.,
018 ), cannot robustly handle the cases with missing teeth even
hough it is a common situation in real-world clinics.
To further demonstrate the robustness of our proposed method,
e construct two testing subsets containing the abnormal (206
ental models) and normal cases (194 dental models) based on
ur testing dataset (400 dental models). As shown in Fig. 10 ,
ur method is robust to handle the abnormal cases and the two
SC metrics change mildly on the normal and abnormal subsets.
owever, the performance of PointNet++ and Mask-MCNet drops
apidly on abnormal cases, due to their lack of ability to handle
bnormal cases with teeth crowding, missing and misalignment
roblems. It is also worth noting that Harmonic Field is a semi-
utomatic method, where additional human input would help pro-
ess the abnormal cases to some extent, but it is still hard to find
he accurate tooth boundaries.
Qualitative comparison The visual comparison results are shown
n Fig. 9 for upper and lower jaws. It can be observed that seg-
entation results produced by our method match better with
he ground truth, especially for extreme cases, such as additional
races (the third row) or crowding teeth (the fourth row). Notably,
ointNet++ and Mask-MCNet, usually produce lots of artifacts in
he tooth body and boundary areas. This shows that high-level fea-
ures extracted by such methods are not reliable when the dental
odel has blurred geometric signals. Besides, the harmonic field
ased method heavily depends on human interactions and is sen-
itive to variations of tooth shape appearances. For example, it fails
o extract a complete tooth body when the tooth surface is com-
licated (as highlighted by red arrows in the fourth column of
ig. 9 ). The qualitative results shown in Fig. 9 are consistent with
he quantitative comparison, which further demonstrates the effec-
iveness and efficiency of our framework for automatic tooth seg-
entation and identification on dental models.
.4. Discussions
In clinical practice, automatic dental model segmentation is an
ssential yet challenging problem in computed-aided orthodontics.
any algorithms, including traditional and deep learning based
ethods, are explored to extract the tooth individually from the
ental model. However, these methods cannot robustly handle
ypical cases with extreme appearance before orthodontics treat-
ents. In this paper, we propose a two-stage framework with
he distance-aware centroid prediction module and the confidence-
ware cascade segmentation module to successfully extract all
eeth from dental models with large variations.
Parameter analysis Totally, there are five core hyper param-
ters used in our method. To analyze the robustness of our
ethod to these parameters, we conduct five experiments, as
hown in Fig. 12 , with different parameter settings and report the
tatistics in terms of the segmentation accuracy. Specifically, our
ethod achieves comparable segmentation results when changing
he number ( N = 16 , 0 0 0 ) of input sampled points ( Fig. 12 (a)). The
eason is that after the first set abstraction layer of the point cloud
ncoder, 1024 points are sampled via farthest sampling to encode
10
he local context, which is far smaller and the distribution is sim-
lar given different numbers of input points. Overall, our network
s insensitive to the choice of N. As for the number ( n = 4096 ) of
ropped points within a tooth proposal, it should ensure a com-
lete foreground tooth is included without too many background
oints. Then, when varying n to a smaller or bigger value than
096, the slight performance degradation appears ( Fig. 12 (b)). The
imilar curve tendency can be observed in Fig. 12 (a), since the
wo parameters n and N are tightly coupled. In our configuration,
= 16 , 0 0 0 and n = 4096 achieved the best performance. In the
ooth centroid detection stage, M = 256 candidate points are re-
ressed for indicating tooth objects. Thus, if M is small, some teeth
ith small crown area, e.g. the lower central incisor, may be miss-
etected. As illustrated in Fig. 12 (c), the performance drops rapidly
hen M is set to 64. Instead, when M is set to 256 or bigger, it is
ufficient to capture all tooth objects in the dental model and no
erformance fluctuation occurs. Meanwhile, in the distance-aware
lter, the distance threshold α is set to 0.2, which is consistent
ith the receptive field of the last set abstraction layer in the en-
oder. A smaller α would filter more candidate points and leads
o miss-detection, while a bigger α usually takes some points far
rom any tooth into consideration and leads to over-detection, as
hown in Fig. 12 (d). The last core parameter is the distance thresh-
ld l = 0 . 015 in DBSCAN clustering algorithm. Due to the strong
lustering effect achieved by our method in the tooth centroid de-
ection stage, it is insensitive to the choice of l ( Fig. 12 (e)).
Centroid vs. bounding box In the first stage of our network, in-
tead of utilizing the bounding box, an intuitive way to indicate an
bject in 2D or 3D images, we design a centroid voting method to
etect and represent each tooth. To investigate the effectiveness,
e conduct an experiment to compare the two representations by
nly replacing the centroid prediction module to a bounding box
egression module in the TSegNet. As illustrated in Fig. 13 , a pre-
olar tooth is failed to be detected by any bounding box. This is
ecause the predicted bounding box of the premolar tooth has a
elative large overlap with the bounding box of the neighboring
olar tooth and is filtered by the non-max suppression (NMS) op-
rator that is designed to remove redundant boxes. In addition,
here is an under-segmentation in the canine tooth because the
orresponding bounding box does not cover the tooth appropri-
tely. Generally, the centroid representation has two main advan-
ages compared to the bounding box representation in this task.
he first is that the centroid voting and detection is more efficient
nd accurate than the NMS operator to filter redundant bounding
oxes. The second is that the tooth centroid is more stable infor-
ation and less sensitive to the tooth shape boundary, while the
ounding box is mainly decided by the tooth shape with blurred
eometric signals.
Teeth missing problem Missing teeth is a common problem in
ental clinics. As shown in the Fig. 14 , there are two typical teeth
issing cases referencing the normal tooth case in Fig. 14 (a). In
ig. 14 (b), it misses a cuspid tooth in the left half and a premolar
n the right half, but visually there is no vacant position. While in
ig. 14 (c), it misses a cuspid tooth in the left with a correspond-
ng vacant position. Generally, it is challenging for semantic-based
earning methods, e.g., PointNet++, to successfully process the ab-
ormal case in Fig. 14 (b). However, instead of relying on the global
emantic context encoded in the input point cloud, we regress a
ooth centroid from the local points around the target tooth. Thus,
ith the proposed distance filter, the Chamfer distance loss and
he separation loss, our method can accurately regress the centroid
oints to indicate the tooth object confidently.
Limitation Although our proposed framework has achieved out-
tanding tooth segmentation results and outperforms many state-
f-the-art methods, it presents some limitations that are worth
onsidering. One typical example is that it tends to yield incom-
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
Fig. 12. The tooth segmentation performance of our TSegNet, when changing the value of different parameters.
Fig. 13. Visual comparison between the bounding box and tooth centroid represen-
tations. (a) The predicted bounding boxes and corresponding segmentation results.
(b) The predicted tooth centroids and corresponding segmentation results.
Fig. 14. Normal and two typical examples with missing teeth. (a) A normal case;
(b) Missing a cuspid tooth in the left half and a premolar in the right half; (c)
Missing a cuspid tooth in the left half.
p
a
a
i
m
Fig. 15. Failure cases. (a) The wisdom tooth with abnormal shape is failed to be
detected. (b) The rudimentary tooth is hard to be segmented accurately due to the
small crown area.
t
w
o
t
o
p
t
w
5
s
i
w
a
n
q
o
m
o
D
c
i
lete tooth segmentation in some cases such as the wisdom tooth
nd the rudimentary tooth. One possible reason is that these cases
re quite rare and seldom seen by the network during the train-
ng phase. Specifically, the wisdom tooth is a special case for hu-
ans, because it has large variations and usually a small part of
11
he crown is appeared on the dental model. As shown in Fig. 15 (a),
e fail to detect the wisdom tooth marked in the brown box. An-
ther case is the rudimentary tooth, it shares similar situation that
he seen crown part has small area and is quite different from
ther teeth. Thus, some background area is likely to be treated as
art of a rudimentary tooth, as illustrated in Fig. 15 (b). In the fu-
ure, we would like to explore more effective method on datasets
ith imbalanced tooth type distribution.
. Conclusion
In this work, we develop a novel fully automatic algorithm to
egment tooth on 3D dental models guided by the tooth centroid
nformation. The algorithm builds upon a two-stage neural net-
ork containing a robust tooth centroid prediction subnetwork and
single tooth segmentation subnetwork with our novel compo-
ents and loss functions. We have evaluated our algorithm both
ualitatively and quantitatively, and compared it with the state-
f-the-art learning and non-learning based methods, where our
ethod produces superior results and significantly outperforms
thers.
eclaration of Competing Interest
The authors declare that they have no known competing finan-
ial interests or personal relationships that could have appeared to
nfluence the work reported in this paper.
Z. Cui, C. Li, N. Chen et al. Medical Image Analysis 69 (2021) 101949
C
C
N
W
Y
W
R
C
C
E
G
G
H
H
H
H
K
K
K
L
L
L
L
OL
Q
Q
S
S
T
W
W
W
X
Y
Y
Y
Z
Z
Z
Z
RediT authorship contribution statement
Zhiming Cui: Methodology, Software, Writing - original draft.
hangjian Li: Methodology, Writing - original draft, Software.
englun Chen: Methodology, Writing - original draft. Guodong
ei: Methodology, Data curation. Runnan Chen: Methodology.
uanfeng Zhou: Data curation, Writing - original draft. Wenping
ang: Supervision, Writing - original draft.
eferences
obourne, M.T. , DiBiase, A.T. , 2015. Handbook of Orthodontics. Elsevier Health Sci-
ences .
ui, Z. , Li, C. , Wang, W. , 2019. ToothNet: automatic tooth instance segmentation andidentification from cone beam CT images. In: Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition, pp. 6368–6377 . ster, M. , Kriegel, H.P. , Sander, J. , Xu, X. , et al. , 1996. A density-based algorithm for
discovering clusters in large spatial databases with noise. In: Kdd, pp. 226–231 . race, M. , 20 0 0. Dental notation. Br. Dent. J. 188, 229 .
rzegorzek, M., Trierscheid, M., Papoutsis, D., Paulus, D., 2010. A multi-stage ap-
proach for 3D teeth segmentation from dentition surfaces. Springer. Interna- tional Conference on Image and Signal Processing, 521–530.
ajeer, M. , Millett, D. , Ayoub, A. , Siebert, J. , 2004. Applications of 3D imaging inorthodontics: part i. J. Orthod. 31, 62–70 .
ajeer, M. , Millett, D. , Ayoub, A. , Siebert, J. , 2004. Applications of 3D imaging inorthodontics: part ii. J. Orthod. 31, 154–162 .
e, K. , Gkioxari, G. , Dollár, P. , Girshick, R. , 2017. Mask r-CNN. In: Proceedings of the
IEEE International Conference on Computer Vision, pp. 2961–2969 . ou, J. , Dai, A. , Nießner, M. , 2019. 3D-SIS: 3D semantic instance segmentation of
RGB-D scans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4 421–4 430 .
ondo, T. , Ong, S.H. , Foong, K.W. , 2004. Tooth segmentation of dental study modelsusing range images. IEEE Trans. Med. Imaging 23, 350–362 .
ronfeld, T. , Brunner, D. , Brunnett, G. , 2010. Snake-based segmentation of teeth fromvirtual dental casts. Comput. Aided Des. Appl. 7, 221–233 .
umar, Y. , Janardan, R. , Larson, B. , Moon, J. , 2011. Improved segmentation of teeth
in dental models. Comput. Aided Des. Appl. 8, 211–224 . echuga, L. , Weidlich, G.A. , 2016. Cone beam CT vs. fan beam CT: a comparison of
image quality and dose delivered between two differing ct imaging modalities. Cureus (Palo Alto, CA) 8 (9), E778 .
i, Y. , Bu, R. , Sun, M. , Wu, W. , Di, X. , Chen, B. , 2018. Pointcnn: convolution onx-transformed points. In: Advances in Neural Information Processing Systems,
pp. 820–830 .
i, Z., Ning, X., Wang, Z., 2007. A fast segmentation method for STL teeth model.IEEE. 2007 IEEE/ICME International Conference on Complex Medical Engineer-
ing, 163–166. ian, C., Wang, L., Wu, T. H., Liu, M., Durán, F., Ko, C. C., Shen, D., 2019. MeshsNet:
deep multi-scale mesh feature learning for end-to-end tooth labeling on 3D dental surfaces. Springer. International Conference on Medical Image Comput-
ing and Computer-Assisted Intervention, 837–845,
12
pitz, J., Burst, S., 2019. Macro f1 and macro f1. arXiv preprint arXiv: 1911.03347 . ian, C. , Wang, L. , Wu, T.H. , Wang, F. , Yap, P.T. , Ko, C.C. , Shen, D. , 2020. Deep multi-
-scale mesh feature learning for automated labeling of raw dental surfaces from
i, C.R. , Su, H. , Mo, K. , Guibas, L.J. , 2017. PointNet: deep learning on point sets for3Dclassification and segmentation. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 652–660 . i, C.R. , Yi, L. , Su, H. , Guibas, L.J. , 2017. PointNet++: deep hierarchical feature learn-
ing on point sets in a metric space. In: Advances in Neural Information Process-
ing Systems, pp. 5099–5108 . inthanayothin, C., Tharanont, W., 2008. Orthodontics treatment simulation by teeth
segmentation and setup. IEEE. 2008 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Tech-
nology, 81–84. un, D., Pei, Y., Song, G., Guo, Y., Ma, G., Xu, T., Zha, H., 2020. Tooth segmentation
and labeling from digital dental casts. IEEE. 2020 IEEE 17th International Sym-
posium on Biomedical Imaging (ISBI), 669–673. ian, S. , Dai, N. , Zhang, B. , Yuan, F. , Yu, Q. , Cheng, X. , 2019. Automatic classification
and segmentation of teeth on 3D dental model using hierarchical deep learning networks. IEEE Access 7, 84 817–84 828 .
ongwaen, N., Sinthanayothin, C., 2010. Computerized algorithm for 3D teeth seg- mentation. IEEE. 2010 International Conference on Electronics and Information
Engineering, V1–277.
u, K. , Chen, L. , Li, J. , Zhou, Y. , 2014. Tooth segmentation on dental meshes usingmorphologic skeleton. Comput. Graph. 38, 199–211 .
u, W. , Qi, Z. , Fuxin, L. , 2019. Pointconv: deep convolutional networks on 3D pointclouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 9621–9630 . u, X. , Liu, C. , Zheng, Y. , 2018. 3D tooth segmentation and labeling using deep con-
amany, S. M., El-Bialy, A. M., 1999. Efficient free-form surface representation with application in orthodontics. International Society for Optics and Photon-
ics. Three-Dimensional Image Capture and Applications II, 115–124. aqi, M., Zhongke, L., 2010. Computer aided orthodontics treatment by virtual seg-
mentation and adjustment. IEEE. 2010 International Conference on Image Anal- ysis and Signal Processing, 336–339.
uan, T. , Liao, W. , Dai, N. , Cheng, X. , Yu, Q. , 2010. Single-tooth modeling for 3D den-
tal model. J. Biomed. Imaging 2010, 9 . anjani, F. G., Moin, D. A., Claessen, F., Cherici, T., Parinussa, S., Pourtaherian, A.,
Zinger, S., et al., 2019. Mask-MCNet: instance segmentation in 3D point cloud of intra-oral scans. Springer. 22nd International Conference on Medical Image
Computing and Computer Assisted Intervention, (MICCAI2019). hao, M., Ma, L., Tan, W., Nie, D., 2006. Interactive tooth segmentation of dental
models. IEEE. 2005 IEEE Engineering in Medicine and Biology 27th Annual Con-
ference, 654–657. hou, Y. , Tuzel, O. , 2018. VoxelNet: end-to-end learning for point cloud based 3D
object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4 490–4 499 .
ou, B.j. , Liu, S.j. , Liao, S.h. , Ding, X. , Liang, Y. , 2015. Interactive tooth partitionof dental mesh base on tooth-target harmonic field. Comput. Biol. Med. 56,