Online Appendix to: Integrating Social and Auxiliary Semantics for Multi-Faceted Topic Modeling in Twitter

1. Clustering Results

Corresponding figure in paper: Fig. 10.

Direct

Dataset ML

Model result (NMI) Statistical significance of MfTM
K LDA T-LDA TOT DLDA MfTM DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.343 0.376 0.410 0.314 0.408 *** **
100 0.379 0.384 0.412 0.387 0.437 *** *- * * *** *- *- *
150 0.400 0.406 0.403 0.388 0.440 *** *- *- * *** *- * *
200 0.424 0.394 0.401 0.390 0.442 *** ** ** *- *** ** *- *- * * * *- * *

Dataset HL

Model result (NMI) Statistical significance of MfTM
K LDA T-LDA TOT DLDA MfTM DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.479 0.274 0.451 0.268 0.555 *** *** *** *** *- * *** ** *- *** *** *** ***
100 0.498 0.397 0.498 0.362 0.585 *** *** *** *** ** *- * * *** *** *** ** *** *** *** ***
150 0.526 0.411 0.503 0.379 0.598 *** *** *** *** ** ** *- *- *** *** *** *** *** *** *** ***
200 0.522 0.432 0.530 0.397 0.618 *** *** *** *** *** ** *- ** *** *** *** *** *** *** *** ***

K-means

Dataset ML

Model result (NMI) Statistical significance of MfTM
K TFIDF LDA T-LDA TOT DLDA MfTM TFIDF DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.312 0.369 0.358 0.375 0.398 0.421 *** * ** *- ** *** *** *** *** *** *** *** *** *** *** *** ***
100 0.312 0.345 0.356 0.362 0.397 0.415 *** *- * ** *** *** *** *** ** *** *** *** *** *** *** ***
150 0.312 0.340 0.348 0.353 0.398 0.422 *** *- ** *- ** *** *** *** *** *** *** *** *** *** *** *** ***
200 0.312 0.324 0.352 0.339 0.383 0.421 *** * *- *- ** *** *** *** *** *** *** *** *** *** *** *** ***

Dataset HL

Model result (NMI) Statistical significance of MfTM
K TFIDF LDA T-LDA TOT DLDA MfTM TFIDF DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.393 0.463 0.268 0.469 0.355 0.573 *** *** *** *** *** ** ** *** *** *** *** *** *** *** *** *** ***
100 0.393 0.449 0.305 0.448 0.375 0.576 *** *** *** *** *** ** ** ** *** *** *** *** *** *** *** *** ***
150 0.393 0.448 0.299 0.461 0.379 0.563 *** *** *** *** *** ** ** ** ** *** *** *** *** *** *** *** ***
200 0.393 0.427 0.307 0.444 0.379 0.586 *** *** *** *** *** ** ** *** *** *** *** *** *** *** *** *** ***

SPIC

Dataset ML

Model result (NMI) Statistical significance of MfTM
K TFIDF LDA T-LDA TOT DLDA MfTM TFIDF DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.487 0.338 0.390 0.338 0.237 0.520 *** *** *** *** *** *** *- *** *** *** *** *** ** * *-
100 0.487 0.389 0.420 0.293 0.308 0.571 *- *** *** *** *** *** *** *** ** *** *** *** *** *** *** ** **
150 0.487 0.469 0.463 0.266 0.356 0.552 *- *** *** *** *** *** *** ** *- *** *** *** *** *** *** ** **
200 0.487 0.499 0.460 0.256 0.393 0.515 *** *** *** ** *** ** *** *** *** *** ** *-

Dataset HL

Model result (NMI) Statistical significance of MfTM
K TFIDF LDA T-LDA TOT DLDA MfTM TFIDF DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.689 0.504 0.317 0.473 0.281 0.737 *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
100 0.689 0.543 0.453 0.488 0.398 0.776 *- *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
150 0.689 0.563 0.462 0.460 0.438 0.762 * *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
200 0.689 0.562 0.482 0.475 0.478 0.788 *- *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

DBSCAN

Dataset ML

Model result (NMI) Statistical significance of MfTM
K TFIDF LDA T-LDA TOT DLDA MfTM TFIDF DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.398 0.283 0.348 0.386 0.333 0.403 * ** **
100 0.398 0.387 0.358 0.369 0.336 0.452 * *** ** *** *** *** *- ** ** *- ** *- *- *** *- ** *-
150 0.398 0.369 0.362 0.372 0.334 0.445 ** ** *** *** ** *- ** *- * *- *- *- ** *- *- *
200 0.398 0.368 0.361 0.371 0.296 0.456 *- *** *** *** *** *** ** *** ** *- ** ** *- *** *- ** *-

Dataset HL

Model result (NMI) Statistical significance of MfTM
K TFIDF LDA T-LDA TOT DLDA MfTM TFIDF DLDA 50 DLDA 100 DLDA 150 DLDA 200 LDA 50 LDA 100 LDA 150 LDA 200 TOT 50 TOT 100 TOT 150 TOT 200 T-LDA 50 T-LDA 100 T-LDA 150 T-LDA 200
50 0.548 0.443 0.499 0.688 0.487 0.790 ** *** *** *** *** *** *** *** *** ** *** ** *** *** *** *** ***
100 0.548 0.545 0.520 0.679 0.477 0.769 *- ** ** ** ** *** *- ** ** ** *- *- *-
150 0.548 0.483 0.510 0.702 0.423 0.783 *** *** *** *** *** ** *** *** *** *- ** *- ** *** *** *** ***
200 0.548 0.439 0.525 0.686 0.402 0.750 *- ** ** ** ** ** *- ** ** ** *- *- *-

2. Semantic enrichment

Web-document-based SE

Corresponding figure in paper: Fig. 8 (a)-(b).

SE amount K-means - ML K-means - HL DBSCAN - ML DBSCAN - HL
0H_10W 0.3970 0.5382 0.4198 0.7658
0H_20W 0.4093 0.5455 0.4234 0.7715
0H_30W 0.4103 0.5569 0.4127 0.7625
0H_40W 0.4119 0.5611 0.4393 0.7579
0H_50W 0.4206 0.5566 0.4200 0.7683

Hashtag-based SE

Corresponding figure in paper: Fig. 8 (c)-(d).

K-means - ML K-means - HL DBSCAN - ML DBSCAN - HL
SE amount TimeSens Naïve TimeSens Naïve TimeSens Naïve TimeSens Naïve
0H_10W 0.3970 0.3970 0.5382 0.5382 0.4198 0.4198 0.7658 0.7658
10H_10W 0.4140 0.3942 0.5682 0.5591 0.4483 0.4271 0.7512 0.7552
20H_10W 0.4059 0.3977 0.5666 0.5603 0.4518 0.4345 0.7697 0.7602
30H_10W 0.4081 0.3972 0.5724 0.5641 0.4617 0.4549 0.7748 0.7579
40H_10W 0.4085 0.3983 0.5614 0.5668 0.4351 0.4503 0.7689 0.7532
50H_10W 0.4104 0.3975 0.5645 0.5585 0.4384 0.4430 0.7642 0.7643

3. Utility of named entities and timestamps

Corresponding figure in paper: Fig. 11.

Dataset ML

NMI Utility
Direct K-means DBSCAN Direct K-means DBSCAN
All 0.4371 0.4151 0.4516 0.0000 0.0000 0.0000
person 0.3975 0.3906 0.4230 0.0396 0.0245 0.0285
organization 0.3963 0.3964 0.4251 0.0408 0.0187 0.0265
location 0.4119 0.3925 0.4396 0.0252 0.0226 0.0119
timestamp 0.3938 0.3863 0.4241 0.0433 0.0288 0.0274

Dataset HL

NMI Utility
Direct K-means DBSCAN Direct K-means DBSCAN
All 0.5850 0.5760 0.7686 0.0000 0.0000 0.0000
person 0.5858 0.5737 0.7394 -0.0008 0.0022 0.0292
organization 0.5769 0.5654 0.7203 0.0080 0.0105 0.0483
location 0.5634 0.5518 0.7396 0.0216 0.0242 0.0290
timestamp 0.5049 0.4863 0.6357 0.0800 0.0897 0.1329

4. Perplexity

MfTM: Perplexity of online inference and Gibbs sampling

Corresponding figure in paper: Fig. 6.

K=50 K=100
No. of processed posts Online MfTM Gibbs MfTM Online MfTM Gibbs MfTM
9000 38954.05291 5436.854216 433560.7559 5394.711381
18000 11412.08046 5436.854216 168259.4554 5394.711381
28000 9915.462608 5436.854216 123737.2801 5394.711381
37000 8522.545313 5436.854216 98436.51515 5394.711381
46000 8360.735552 5436.854216 93890.7899 5394.711381
55000 8206.932254 5436.854216 81969.27728 5394.711381
65000 8217.231083 5436.854216 72794.09422 5394.711381
74000 7850.013603 5436.854216 65468.59898 5394.711381
83000 7619.565621 5436.854216 57456.54233 5394.711381
92000 7832.040264 5436.854216 50867.56732 5394.711381
102000 7656.407878 5436.854216 48435.60101 5394.711381
111000 7314.840181 5436.854216 45652.16889 5394.711381
120000 6952.172525 5436.854216 43837.89957 5394.711381
129000 6911.282114 5436.854216 40223.5684 5394.711381
139000 6652.048845 5436.854216 38213.09417 5394.711381
148000 6359.11887 5436.854216 33726.94741 5394.711381
157000 6378.679089 5436.854216 32082.00939 5394.711381
166000 6251.727982 5436.854216 29148.06264 5394.711381
176000 6298.507438 5436.854216 28487.10686 5394.711381
185000 6021.331931 5436.854216 27810.17761 5394.711381
194000 6229.49054 5436.854216 27507.86676 5394.711381
203000 6269.612359 5436.854216 26179.01307 5394.711381
213000 6505.408664 5436.854216 24487.13875 5394.711381
222000 6516.804255 5436.854216 23619.28939 5394.711381
231000 6548.422809 5436.854216 21628.86641 5394.711381
240000 6838.797775 5436.854216 19936.58102 5394.711381
250000 6504.31229 5436.854216 19000.39792 5394.711381
259000 6404.878071 5436.854216 18904.3863 5394.711381
268000 6296.096943 5436.854216 19083.60995 5394.711381
277000 6344.933808 5436.854216 18556.29485 5394.711381
287000 6003.767901 5436.854216 18829.10748 5394.711381
296000 5672.863434 5436.854216 19336.59254 5394.711381
305000 5572.085299 5436.854216 18675.15085 5394.711381
314000 5243.437613 5436.854216 18450.46253 5394.711381
324000 5281.43218 5436.854216 17743.68828 5394.711381
333000 4976.244005 5436.854216 17871.43465 5394.711381
342000 4923.813338 5436.854216 16745.03062 5394.711381
351000 4936.621638 5436.854216 16279.57859 5394.711381
361000 5228.118139 5436.854216 15595.54858 5394.711381
370000 5017.67584 5436.854216 15334.84515 5394.711381
379000 5165.453349 5436.854216 15453.06173 5394.711381
388000 5310.989257 5436.854216 14458.94203 5394.711381
398000 5310.232177 5436.854216 12270.04187 5394.711381
407000 5278.435048 5436.854216 13311.25391 5394.711381
416000 5038.958705 5436.854216 12698.37858 5394.711381
425000 5003.200908 5436.854216 12534.04177 5394.711381
435000 4879.546328 5436.854216 12117.14531 5394.711381
444000 4789.460008 5436.854216 12803.74382 5394.711381
453000 4625.591986 5436.854216 13851.62926 5394.711381
462000 4759.306322 5436.854216 13846.26884 5394.711381
472000 4656.385148 5436.854216 13676.13343 5394.711381
481000 4560.016936 5436.854216 12856.69776 5394.711381
490000 4487.351773 5436.854216 12629.99905 5394.711381
499000 4648.488226 5436.854216 12997.07871 5394.711381
509000 4655.472425 5436.854216 11421.4722 5394.711381
518000 4594.198427 5436.854216 12067.07643 5394.711381
527000 4706.580796 5436.854216 11956.88068 5394.711381
536000 4585.309551 5436.854216 11786.46867 5394.711381
546000 4438.855613 5436.854216 11312.7086 5394.711381
555000 4409.223859 5436.854216 9758.927789 5394.711381
564000 4328.138007 5436.854216 10218.88819 5394.711381
573000 4319.5537 5436.854216 10247.98738 5394.711381
583000 4262.659429 5436.854216 10300.74324 5394.711381
592000 4155.319315 5436.854216 11431.82246 5394.711381
601000 4212.170698 5436.854216 10931.35664 5394.711381
610000 4355.414886 5436.854216 10846.03247 5394.711381
620000 4443.916626 5436.854216 10556.711 5394.711381
629000 4449.646778 5436.854216 10534.19834 5394.711381
638000 4293.125583 5436.854216 11083.89359 5394.711381
647000 4293.660137 5436.854216 11190.4455 5394.711381
657000 4153.918992 5436.854216 10967.45041 5394.711381
666000 4284.363564 5436.854216 10090.8466 5394.711381
675000 4462.040812 5436.854216 10170.32119 5394.711381
684000 4278.325172 5436.854216 9990.436615 5394.711381
694000 4249.490423 5436.854216 8859.10066 5394.711381
703000 4034.002043 5436.854216 9267.554392 5394.711381
712000 3924.484991 5436.854216 9472.557061 5394.711381
721000 4169.707689 5436.854216 8991.771543 5394.711381
731000 4179.127236 5436.854216 8965.612967 5394.711381
740000 4074.761661 5436.854216 8645.167622 5394.711381
749000 4093.680471 5436.854216 8493.63616 5394.711381
758000 4035.394378 5436.854216 8733.662126 5394.711381
768000 4107.239718 5436.854216 9158.293843 5394.711381
777000 4324.118287 5436.854216 8938.728955 5394.711381
786000 4232.039823 5436.854216 9100.168502 5394.711381
795000 4248.492108 5436.854216 8818.786074 5394.711381
804000 4156.021246 5436.854216 8673.538321 5394.711381
814000 4008.661253 5436.854216 8279.599484 5394.711381
823000 4316.635901 5436.854216 8150.832293 5394.711381
832000 4310.130945 5436.854216 7871.37286 5394.711381
841000 4358.392727 5436.854216 10255.14363 5394.711381
851000 4310.564317 5436.854216 10366.42932 5394.711381
860000 4107.712683 5436.854216 10206.95941 5394.711381
869000 4131.141379 5436.854216 10015.74029 5394.711381
878000 4247.599424 5436.854216 10099.86774 5394.711381
888000 4300.235717 5436.854216 9839.907128 5394.711381
897000 4478.040326 5436.854216 9840.009603 5394.711381
906000 4517.02726 5436.854216 9806.945239 5394.711381
915000 4300.984055 5436.854216 9742.078066 5394.711381
925000 4328.470732 5436.854216 9822.567687 5394.711381
934000 4322.412 5436.854216 9874.627654 5394.711381
943000 4253.386 5436.854216 7265.004413 5394.711381
952000 4250.155 5436.854216 7012.479024 5394.711381
962000 4194.836 5436.854216 7102.557604 5394.711381
971000 4163.516 5436.854216 7098.197686 5394.711381
980000 4160.676 5436.854216 6896.701516 5394.711381
989000 4147.478 5436.854216 6903.719135 5394.711381
999000 4085.938 5436.854216 7273.609172 5394.711381
1008000 4041.14 5436.854216 6958.094994 5394.711381
1017000 4044.412 5436.854216 6893.089884 5394.711381
1026000 3905.233 5436.854216 6873.24313 5394.711381
1036000 3887.984 5436.854216 6998.064377 5394.711381
1045000 3904.338 5436.854216 7152.819602 5394.711381
1054000 3906.964 5436.854216 7683.427099 5394.711381
1063000 3847.375 5436.854216 7193.110886 5394.711381
1073000 3932.91 5436.854216 6619.446467 5394.711381
1082000 3848.14 5436.854216 6544.883039 5394.711381
1091000 3984.966 5436.854216 7143.350273 5394.711381
1100000 3909.894 5436.854216 7499.79215 5394.711381
1110000 4009.298 5436.854216 7595.999789 5394.711381
1119000 4249.073 5436.854216 7597.370657 5394.711381
1128000 4420.327 5436.854216 8153.946378 5394.711381
1137000 4340.078 5436.854216 8169.412494 5394.711381
1147000 4282.924 5436.854216 8255.788492 5394.711381
1156000 4437.297 5436.854216 7816.793279 5394.711381
1165000 4363.984 5436.854216 8109.29166 5394.711381
1174000 4253.124775 5436.854216 8495.123885 5394.711381
1184000 4177.880815 5436.854216 8534.350791 5394.711381
1193000 4067.586191 5436.854216 8056.088784 5394.711381
1202000 4102.249209 5436.854216 7565.178858 5394.711381
1211000 4158.634089 5436.854216 7810.974793 5394.711381
1221000 4162.303369 5436.854216 7892.355234 5394.711381
1230000 4064.209298 5436.854216 7407.609035 5394.711381
1239000 5030.033174 5436.854216 8662.947151 5394.711381
1248000 4854.904152 5436.854216 8315.425165 5394.711381
1258000 4855.49659 5436.854216 8242.389028 5394.711381
1267000 4856.068752 5436.854216 8336.052782 5394.711381
1276000 4809.227949 5436.854216 8157.917736 5394.711381
1285000 4877.147976 5436.854216 8121.026714 5394.711381
1295000 4867.709505 5436.854216 8143.711533 5394.711381
1304000 4861.294751 5436.854216 8836.776656 5394.711381
1313000 4874.78377 5436.854216 8817.199668 5394.711381
1322000 5075.378425 5436.854216 9644.091644 5394.711381
1332000 5084.324045 5436.854216 9676.572562 5394.711381
1341000 4124.784886 5436.854216 8440.436688 5394.711381
1350000 4222.365443 5436.854216 8721.173304 5394.711381
1359000 4231.637675 5436.854216 8713.090886 5394.711381
1369000 4186.898696 5436.854216 8489.82004 5394.711381
1378000 4115.195422 5436.854216 8321.261595 5394.711381
1387000 4120.213817 5436.854216 8289.776778 5394.711381
1396000 4062.055723 5436.854216 8150.702833 5394.711381
1406000 4016.444647 5436.854216 7394.741218 5394.711381
1415000 3851.146437 5436.854216 6984.060955 5394.711381
1424000 3602.163068 5436.854216 5947.032498 5394.711381
1433000 3557.385674 5436.854216 5752.579395 5394.711381
1443000 3545.083163 5436.854216 5566.803534 5394.711381
1452000 3556.769886 5436.854216 5568.951746 5394.711381
1461000 3598.763624 5436.854216 5799.97752 5394.711381
1470000 3571.985804 5436.854216 5786.085495 5394.711381
1480000 3547.950776 5436.854216 5720.809119 5394.711381
1489000 3500.54739 5436.854216 5684.279974 5394.711381
1498000 3551.761282 5436.854216 5883.762904 5394.711381
1507000 3479.999416 5436.854216 5712.844657 5394.711381
1517000 3742.394715 5436.854216 7122.655597 5394.711381
1526000 3862.144612 5436.854216 7433.759107 5394.711381
1535000 3947.162615 5436.854216 7576.679681 5394.711381
1544000 3899.202764 5436.854216 7502.51836 5394.711381
1553000 3941.880124 5436.854216 7618.650917 5394.711381
1563000 3823.183743 5436.854216 7327.030348 5394.711381
1572000 3833.19325 5436.854216 7322.471396 5394.711381
1581000 3942.391426 5436.854216 7660.962963 5394.711381
1590000 3896.37793 5436.854216 7534.736776 5394.711381
1600000 3894.874127 5436.854216 7435.579121 5394.711381
1609000 4010.031283 5436.854216 7739.810421 5394.711381
1618000 3694.545293 5436.854216 6260.827474 5394.711381
1627000 3721.07425 5436.854216 6346.092943 5394.711381
1637000 3592.616959 5436.854216 6195.620201 5394.711381
1646000 3596.342089 5436.854216 6258.032103 5394.711381
1655000 3515.007081 5436.854216 5917.602259 5394.711381
1664000 3524.637294 5436.854216 5842.611225 5394.711381
1674000 3454.117194 5436.854216 5799.59023 5394.711381
1683000 3414.006764 5436.854216 5593.812368 5394.711381
1692000 3584.028563 5436.854216 6038.695423 5394.711381
1701000 3500.371087 5436.854216 5876.945165 5394.711381
1711000 3447.495038 5436.854216 5739.319892 5394.711381
1720000 3528.127795 5436.854216 5832.833525 5394.711381
1729000 3392.938312 5436.854216 5487.020133 5394.711381
1738000 3470.859063 5436.854216 5554.712709 5394.711381
1748000 3511.176547 5436.854216 5623.317707 5394.711381
1757000 3429.807474 5436.854216 5550.82247 5394.711381
1766000 3414.069382 5436.854216 5531.986252 5394.711381
1775000 3484.795892 5436.854216 5678.017256 5394.711381
1785000 3502.412908 5436.854216 5726.089727 5394.711381
1794000 3504.021128 5436.854216 5510.00623 5394.711381
1803000 3500.780361 5436.854216 5522.832251 5394.711381
1812000 3606.414001 5436.854216 7120.374062 5394.711381
1822000 3712.775817 5436.854216 7677.900749 5394.711381
1831000 3753.098584 5436.854216 7937.112674 5394.711381
1840000 3710.468196 5436.854216 7938.483192 5394.711381
1849000 3650.506295 5436.854216 7817.623217 5394.711381
1859000 3639.066685 5436.854216 7704.672053 5394.711381
1868000 3656.747051 5436.854216 7864.35834 5394.711381
1877000 3655.12349 5436.854216 8047.393758 5394.711381
1886000 3650.960332 5436.854216 8021.872526 5394.711381
1896000 3459.00136 5436.854216 7756.125758 5394.711381
1905000 3479.494013 5436.854216 7838.925577 5394.711381
1914000 3308.144793 5436.854216 6083.507158 5394.711381
1923000 3214.977568 5436.854216 5616.23831 5394.711381
1933000 3142.33835 5436.854216 5287.639303 5394.711381
1942000 3132.795242 5436.854216 5196.331005 5394.711381
1951000 3199.06027 5436.854216 5294.177692 5394.711381
1960000 3313.091753 5436.854216 5590.104464 5394.711381
1970000 3263.728525 5436.854216 5399.837059 5394.711381
1979000 3283.442012 5436.854216 5133.389122 5394.711381
1988000 3299.612013 5436.854216 5342.940333 5394.711381
1997000 3351.56392 5436.854216 5410.658785 5394.711381
2007000 3422.514222 5436.854216 5675.166988 5394.711381
2016000 3438.890355 5436.854216 5708.40702 5394.711381
2025000 3452.687236 5436.854216 5713.086497 5394.711381
2034000 3554.414357 5436.854216 5975.765099 5394.711381
2044000 3590.115038 5436.854216 6035.264498 5394.711381
2053000 3499.310271 5436.854216 5842.520378 5394.711381
2062000 3389.280294 5436.854216 5594.602171 5394.711381
2071000 3444.028061 5436.854216 5712.755899 5394.711381
2081000 3419.410949 5436.854216 5776.169699 5394.711381
2090000 3378.89711 5436.854216 5532.232467 5394.711381
2099000 3333.40237 5436.854216 5575.957187 5394.711381
2108000 3316.843 5436.854216 5502.765027 5394.711381
2118000 3344.139663 5436.854216 5558.196631 5394.711381
2127000 3349.911174 5436.854216 5561.888372 5394.711381

5. Scalability

Impact of topic amount and semantic enrichment on training time

Corresponding figure in paper: Fig. 7 (a)

K=50 K=100 K=150 K=200
MfTM-T 00:35:33 00:50:33 01:06:00 01:21:41
MfTM-T+H 00:37:15 00:53:41 01:10:00 01:24:18
MfTM-T+W 00:36:19 00:51:49 01:08:00 01:22:08
MfTM-T+H+W 00:37:48 00:53:23 01:10:00 01:25:13

Statistical analysis of the above results using linear regression:

Linear function Goodness of fit
Weight Bias Chi Square Mean L1 Error Root Mean Squared Error
MfTM-T 2.14E-04 0.01386825 5.85E-08 1.20E-04 1.21E-04
MfTM-T+H 2.19E-04 2.19E-04 6.37E-07 3.70E-04 3.99E-04
MfTM-T+W 2.13E-04 0.01469565 4.08E-07 2.62E-04 3.19E-04
MfTM-T+H+W 2.21E-04 0.0151964 1.60E-07 1.70E-04 2.00E-04

Impact of dataset size on training time

Corresponding figure in paper: Fig. 7 (b)

Running time 0.5M 1M 1.5M 2M
MfTM 50 00:08:53 00:17:46 00:26:39 00:35:33
MfTM 100 00:12:38 00:25:16 00:37:55 00:50:33
MfTM 200 00:20:25 00:40:51 01:01:16 01:21:41

Statistical analysis of the above results using linear regression:

Linear function Goodness of fit
Weight Bias Chi Square Mean L1 Error Root Mean Squared Error
MfTM 50 1.23E-02 -5.20E-18 3.00E-13 2.50E-07 2.74E-07
MfTM 100 1.76E-02 -3.47E-18 5.12E-35 2.17E-18 3.58E-18
MfTM 200 2.84E-02 -5.00E-07 2.00E-13 2.00E-07 2.24E-07

Processing time per document (sec.) of MfTM and OG-LDA

Corresponding figure in paper: Fig. 7 (c)

MfTM OG-LDA
No. of processed docs (thousands) K=200 K=50 K=100 K=150 K=200
26 0.00410 0.08977 0.08335 0.08658 0.09027
52 0.00387 0.09054 0.09523 0.09719 0.10615
78 0.00381 0.11469 0.11080 0.12838 0.14254
104 0.00376 0.14696 0.14548 0.16838 0.18358
130 0.00368 0.16512 0.17864 0.20227 0.22085
156 0.00364 0.18981 0.20763 0.23945 0.26475
182 0.00362 0.20923 0.23893 0.27965 0.30112
208 0.00359 0.23354 0.26919 0.31547 0.34217
234 0.00358 0.26619 0.30514 0.35168 0.38766
260 0.00353 0.28823 0.34169 0.39166 0.42834

Processing time per document (sec.) of MfTM

No corresponding figure in paper.

No. of processed docs (thousands) K=50 K=100 K=150 K=200
26 0.001971 0.003077 0.002820 0.004097
52 0.001949 0.002949 0.002748 0.003875
78 0.001913 0.002920 0.002736 0.003811
104 0.001869 0.002866 0.002756 0.003759
130 0.001849 0.002865 0.002770 0.003678
156 0.001827 0.002842 0.002758 0.003638
182 0.001790 0.002830 0.002751 0.003620
208 0.001773 0.002826 0.002732 0.003587
234 0.001762 0.002818 0.002734 0.003576
260 0.001770 0.002818 0.002726 0.003533
286 0.001752 0.002801 0.002715 0.003498
312 0.001745 0.002795 0.002707 0.003467
338 0.001743 0.002791 0.002706 0.003446
364 0.001734 0.002781 0.002783 0.003423
390 0.001741 0.002772 0.002798 0.003399
416 0.001748 0.002761 0.002792 0.003384
442 0.001750 0.002752 0.002778 0.003372
468 0.001752 0.002747 0.002758 0.003358
494 0.001757 0.002744 0.002751 0.003346
520 0.001761 0.002742 0.002748 0.003335
546 0.001769 0.002743 0.002767 0.003324
572 0.001770 0.002740 0.002782 0.003312
598 0.001771 0.002738 0.002772 0.003305
624 0.001769 0.002735 0.002764 0.003296
650 0.001771 0.002733 0.002769 0.003300
676 0.001774 0.002729 0.002795 0.003300
702 0.001779 0.002732 0.002793 0.003296
728 0.001782 0.002731 0.002782 0.003293
754 0.001786 0.002730 0.002799 0.003286
780 0.001785 0.002730 0.002808 0.003281
806 0.001781 0.002728 0.002795 0.003276
832 0.001780 0.002726 0.002793 0.003275
858 0.001774 0.002723 0.002785 0.003273
884 0.001772 0.002719 0.002798 0.003271
910 0.001769 0.002716 0.002828 0.003275
936 0.001766 0.002716 0.002851 0.003271
962 0.001764 0.002715 0.002854 0.003274
988 0.001759 0.002713 0.002871 0.003278
1014 0.001756 0.002712 0.002896 0.003275
1040 0.001752 0.002710 0.002914 0.003271
1066 0.001748 0.002708 0.002909 0.003266
1092 0.001745 0.002708 0.002900 0.003263
1118 0.001742 0.002706 0.002890 0.003258
1144 0.001738 0.002706 0.002890 0.003257
1170 0.001734 0.002703 0.002899 0.003259
1196 0.001733 0.002699 0.002893 0.003258
1222 0.001731 0.002697 0.002883 0.003257
1248 0.001728 0.002693 0.002873 0.003255
1274 0.001727 0.002692 0.002863 0.003251
1300 0.001724 0.002687 0.002855 0.003248
1326 0.001724 0.002683 0.002846 0.003245
1352 0.001722 0.002679 0.002837 0.003242
1378 0.001719 0.002676 0.002830 0.003238
1404 0.001716 0.002672 0.002823 0.003236
1430 0.001716 0.002670 0.002816 0.003233
1456 0.001716 0.002670 0.002810 0.003230
1482 0.001715 0.002671 0.002803 0.003229
1508 0.001714 0.002667 0.002797 0.003229
1534 0.001713 0.002664 0.002790 0.003226
1560 0.001710 0.002661 0.002783 0.003222
1586 0.001707 0.002658 0.002777 0.003220
1612 0.001704 0.002655 0.002770 0.003218
1638 0.001701 0.002653 0.002764 0.003223
1664 0.001700 0.002652 0.002759 0.003227
1690 0.001698 0.002652 0.002754 0.003238
1716 0.001696 0.002652 0.002749 0.003251
1742 0.001694 0.002655 0.002744 0.003258
1768 0.001694 0.002657 0.002740 0.003257
1794 0.001692 0.002656 0.002734 0.003255
1820 0.001691 0.002656 0.002728 0.003253
1846 0.001690 0.002658 0.002724 0.003253
1872 0.001688 0.002656 0.002721 0.003251
1898 0.001687 0.002654 0.002717 0.003251
1924 0.001686 0.002654 0.002714 0.003249
1950 0.001685 0.002656 0.002711 0.003248
1976 0.001684 0.002657 0.002706 0.003247
2002 0.001683 0.002659 0.002702 0.003249
2028 0.001681 0.002659 0.002698 0.003256
2054 0.001679 0.002658 0.002694 0.003264
2080 0.001677 0.002657 0.002690 0.003271
2106 0.001666 0.002642 0.002686 0.003273