Age (years), median (IQR) | 15 (13-17) | 15 (13-17) | 16 (14-17) |
| Female sex, n (%) | 7430 (49) | 691 (34.4) | 6739 (51.3) |
|
| Black, non-Hispanic | 4292 (28.3) | 676 (33.6) | 3616 (27.5) |
| Hispanic | 5565 (36.7) | 711 (35.4) | 4854 (36.9) |
| White, non-Hispanic | 4033 (26.6) | 431 (21.4) | 3602 (27.4) |
| Other | 1259 (8.3) | 192 (9.6) | 1067 (8.1) |
|
| Private | 6392 (43) | 744 (37.7) | 5648 (43.8) |
| Medicare, government, or single service | 2026 (13.6) | 268 (13.6) | 1758 (13.6) |
| Medicaid or CHIP | 3637 (24.4) | 564 (28.6) | 3073 (23.8) |
| No insurance | 2821 (19) | 395 (20) | 2426 (18.8) |
| Authorized for food stamps | 7833 (69.4) | 1037 (61.1) | 6796 (70.8) |
|
| BMI percentile, n (%) |
|
|
|
| Underweight (BMI percentile < 5th), n (%) | 462 (3.1) | 40 (2.0) | 422 (3.2) |
| Normal weight (5th ≤ BMI percentile < 85th), n (%) | 8516 (56.8) | 933 (46.8) | 7583 (58.4) |
| Overweight (85th ≤ BMI percentile < 95th), n (%) | 2788 (18.6) | 356 (17.9) | 2432 (18.7) |
| Obese (95th ≤ BMI percentile), n (%) | 3214 (21.5) | 663 (33.3) | 2551 (19.6) |
| Hypertensive , n (%) | 2552 (17.4) | 502 (26.1) | 2050 (16.1) |
| High total cholesterol (≥170 mg/dL), n (%) | 4951 (33.2) | 707 (35.6) | 4244 (32.8) |
| Fasting plasma glucose (mg/dL), median (IQR) | 93 (88-98) | 102 (100-106) | 91 (86-95) |
| Hemoglobin A (%), median (IQR) | 5.2 (5.0-5.4) | 5.5 (5.2-5.7) | 5.2 (5.0-5.3) |
|
| Meals eaten out per week | 2 (1-3) | 2 (1-3) | 2 (1-3) |
| Total grain (oz eq ) intake 24 hours prior | 6.55 (4.24-9.66) | 6.43 (4.19-9.58) | 6.57 (4.25-9.67) |
| Total fruits (cup eq) intake 24 hours prior | 0.38 (0.00-1.44) | 0.26 (0.00-1.37) | 0.40 (0.00-1.45) |
| Total vegetable (cup eq) intake 24 hours prior | 0.88 (0.39-1.58) | 0.84 (0.37-1.54) | 0.89 (0.39-1.59) |
| Total protein (oz eq) intake 24 hours prior | 5.29 (2.71-9.15) | 4.73 (2.46-8.37) | 5.38 (2.76-9.34) |
| Added sugar (tsp eq) intake 24 hours prior | 20.42 (11.49-32.49) | 20.09 (11.15-31.89) | 20.48 (11.57-32.59) |
|
| Physical activity minutes per week, median (IQR) | 209 (45-488) | 210 (49-476) | 209 (45-491) |
| Screen time hours per day, median (IQR) | 5 (3-8) | 5 (3-8) | 5 (2-7) |
| Exposed to secondhand smoke at home, n (%) | 3297 (21.9) | 469 (23.6) | 2828 (21.7) |
a Unweighted statistics of some key variables describing the study population in the youth pre-DM/DM data set overall and by pre-DM/DM status. More detailed statistics for all the variables in our data set can be found in the Data Exploration section of POND.
b Pre-DM/DM: pre–diabetes mellitus and diabetes mellitus.
c CHIP: child health insurance program.
d Hypertensive was defined by blood pressure ≥90th percentile or ≥120/80 mm Hg for children 13 years of age and older [ 2 ].
e eq: equivalent.
We estimated that the survey-weighted prevalence of pre-DM/DM in our study population rose substantially from 4.1% (95% CI 2.8-5.4) in 1999 to 22% (95% CI 18.5-25.6) in 2018 (Figure S3 and section S6 in Multimedia Appendix 1 ). This increasing trend of pre-DM/DM prevalence was consistent with that reported in other NHANES-based studies, which had pre-DM/DM prevalence ranging from 17.7% to 18% [ 18 , 19 ]. We also applied the study population and pre-DM definition criteria reported in a recent study [ 13 ] to NHANES data and derived a similarly sized study population (n=6656 vs n=6598 in the current vs previous analysis [ 13 ]) and youth pre-DM prevalence, which ranged from 11.1% (95% CI 8.9-13.3) to 37.3% (95% CI 31.0-43.6) in our analysis compared with from 11.6% (95% CI 9.5-14.1) to 28.2% (95% CI 23.3-33.6) in the study by Liu et al [ 13 ] (Table S6 in Multimedia Appendix 1 ).
Youth Pre-DM/DM-Focused Data Set
We extracted 95 epidemiological variables from NHANES and organized them into 4 pre-DM/DM-related domains, namely, sociodemographic, health status, diet, and other lifestyle behaviors (Table S1 in Multimedia Appendix 1 ). Table 1 shows the unweighted statistics of some key study population characteristics. Among youth with pre-DM/DM (n=2010), the proportion of youth who were non-Hispanic Black, non-Hispanic White, Hispanic, and other race or ethnicity (including non-Hispanic persons who reported races other than Black or White and non-Hispanic Asian) were 33.6% (n=676), 21.4% (n=431), 35.4% (n=711), and 9.6% (n=192), respectively. Approximately, half (7719/15,149, 51%) of the population were male, and they represented 65.6% (1319/2010) of those with pre-DM/DM. Approximately 32.4% (4528/15,149) of the youth had a family income below poverty level, and 69.4% (7833/15,149) were from households receiving food stamps. The proportion of youth covered by private insurance was higher among those with than with no pre-DM/DM (5648/13,139, 43.8% vs 744/2010, 37.7%). Overall, 21.5% (3214/15,149) of the youth were obese as defined by having a BMI at or above the 95th percentile based on age and gender, and the proportion was 33.3% (663/2010) among youth with pre-DM/DM. Youth with pre-DM/DM tended to have less fruit and vegetable intake and ate lower amounts of protein and total grains than those with no pre-DM/DM. Youth with and with no pre-DM/DM showed similar amounts of physical activity with 209 and 210 minutes per week, respectively ( Table 1 ).
Pre-DM/DM in Youth Online Dashboard
To facilitate other researchers’ use of our youth pre-DM/DM data set and make our methodology transparent and reproducible, we developed POND, which is available on [ 47 ]. Users can navigate POND through its built-in functionalities. For example, users are able to explore the details of the 95 individual variables ( Figure 3 A) and their distributions by pre-DM/DM status ( Figure 3 B), examine the risk factors of youth pre-DM/DM identified from the case studies described below ( Figure 3 C), as well as download the data for customized analysis and the analytical code to replicate our findings ( Figure 3 D). In addition, we make available all the code used to develop the data set, our case studies, and POND itself.
![machine learning case study on yelp machine learning case study on yelp](https://asset.jmir.pub/assets/a6948eaf5354f70225906db84ed098ed.png)
Case Studies Using Our Data Set to Better Understand Youth Pre-DM/DM
We examined the validity and use of our processed multidomain data set for translational studies on youth pre-DM/DM by the following 2 complementary types of data analyses.
Identifying Individual Variables Associated With Pre-DM/DM Status
In our bivariate analyses, we found 27 variables to be significantly ( P <.001, Bonferroni adjusted) associated with pre-DM/DM status ( Figure 4 [ 63 ] and Table S7 in Multimedia Appendix 1 ). These variables spanned all 4 domains and included gender, race or ethnicity, use of food stamps, health insurance status, BMI, total protein intake, and screen time. Similar results were found when repeating these bivariate association tests after accounting for NHANES survey design elements (Table S7 in Multimedia Appendix 1 ).
![machine learning case study on yelp machine learning case study on yelp](https://asset.jmir.pub/assets/d5cae469281ac754fee554763323e9ef.png)
Predicting Youth Pre-DM/DM Status With ML
We used an ML framework, EI [ 53 , 54 ], to leverage the multidomain nature of our data set and predict youth pre-DM/DM status. We also compared EI’s performance with alternative prediction approaches, most prominently the widely used XGBoost algorithm [ 71 ].
The best-performing multidomain EI methodology, stacking [ 75 ] using logistic regression, predicted youth pre-DM/DM status (AUROC=0.67; BA=0.62) more accurately than all the alternative approaches ( Figure 5 ), namely, XGBoost (AUROC=0.64; BA=0.60; Wilcoxon rank sum FDR=1.7×10 4 and 1.8×10 4 , respectively), the ADA pediatric screening guidelines (AUROC=0.57, BA=0.57; Wilcoxon rank sum FDR=1.7×10 4 and 1.8×10 4 , respectively), and 4 single-domain EI (AUROC=0.63-0.54; BA=0.60-0.53; FDR <1.7×10 4 and 1.8×10 4 , respectively).
The multidomain EI also identified 27 variables (the same as the number of significant variables from bivariate analyses) that contributed the most to predicting youth pre-DM/DM status. Among these variables, 16 overlapped with those identified from the bivariate statistical analyses ( Figure 6 ; Fisher P of overlap=7.06×10 6 ). These variables identified by both approaches included some established pre-DM/DM risk factors such as BMI and high total cholesterol, as well as some less-recognized ones such as screen time and taking prescription drugs [ 2 ].
![machine learning case study on yelp machine learning case study on yelp](https://asset.jmir.pub/assets/4c0de3ef50793c1f8cf4e968227e4e40.png)
Principal Findings
Leveraging the rich information in NHANES spanning nearly 20 years, we built the most comprehensive epidemiological data set for studying youth pre-DM/DM. We accomplished this by selecting and harmonizing variables relevant to youth pre-DM/DM from sociodemographic, health status, diet, and other lifestyle behaviors domains. This youth pre-DM/DM data set, as well as several functionalities to explore and analyze it, is publicly available in our user-friendly web portal, POND. We also conducted case studies using the data set with both traditional statistical methods and ML approaches to demonstrate the potential of using this data set to identify factors relevant to youth pre-DM/DM. The combination of the comprehensive public data set and POND provides avenues for more informed investigations of youth pre-DM/DM.
The future translational impact of pre-DM/DM research, facilitated by comprehensive data sets such as the one developed in this study, holds significant promise for advancing our understanding of the disease and its risk factors among youth. By enabling researchers to investigate multifactorial variables associated with pre-DM/DM, this data set contributes to several areas of research and has a broader impact on the scientific community. First, the data set’s comprehensive nature allows researchers to explore the collective impact of various risk factors across multiple health domains. By incorporating sociodemographic factors, health status indicators, diet, and lifestyle behaviors, researchers can gain a holistic understanding of the interplay between these factors and pre-DM/DM risk among youth. This knowledge can be used to generate hypotheses for further studies and inform the development of targeted interventions and prevention strategies that address the specific needs of at-risk populations. Furthermore, the data set provides an opportunity to delve into less-studied variables and their interactions in relation to pre-DM/DM risk. Variables such as screen time, acculturation, or frequency of eating out, which are often overlooked in traditional research, can be examined to uncover their potential influence on pre-DM/DM risk among youth. This expands the scope of translational research and enhances our understanding of the multifaceted nature of the disease.
One of the major contributions of our work was POND, our publicly available web portal, which provided access to all materials related to our data set and analyses, thus enabling transparency and reproducibility. Although several such portals are available in other biomedical areas, such as genomics [ 76 - 78 ], there is a general lack of such tools in epidemiology and public health. We hope that, in addition to facilitating studies into pre-DM/DM, POND illustrates the use of such portals for population and epidemiological studies as well.
The results of the case studies and validation exercises we conducted were also consistent with existing literature. The case studies identified known pre-DM/DM risk factors, such as gender [ 15 , 17 , 19 ], race and ethnicity [ 2 , 9 , 10 , 24 ], health measures (BMI, hypertension, and cholesterol) [ 2 , 55 ], income [ 9 , 11 ], insurance status [ 9 , 10 ], and health care availability [ 9 , 10 ], thus affirming the validity of the data set. In addition, our analyses revealed some less studied variables, such as screen time, home ownership status, self-reported health status, soy and nut consumption, and frequency of school meal intake, which may influence youth pre-DM/DM risk. Further study of these variables may reveal new knowledge about pre-DM/DM among youth. More generally, such novel findings further demonstrate the use of our data set and data-driven methods for further translational discoveries about this complex disorder.
Limitations
Although our work has several strengths and high potential use for youth pre-DM/DM studies, it is not without limitations. First, as our data set was derived from NHANES, we adopt limitations to the survey in our data set. Since NHANES is a cross-sectional survey, the pre-DM/DM status and its related variables provide only consecutive snapshots of youth in the United States over time across the available survey cycles. Thus, the associations identified are better suited for hypothesis generation purposes and require in-depth investigation using prospective longitudinal and randomized trial designs. In addition, we modified the ADA guideline for determining pre-DM/DM status according to variable availability. Due to the high missingness of 45% in family history (DIQ170) and the complete missingness of maternal history (DIQ175S) from 1999 to 2010 in the raw NHANES data, we were unable to include family history of diabetes in the data set. Similarly, NHANES does not provide data regarding every condition associated with insulin resistance. Therefore, we used hypertension and high cholesterol as proxies for insulin resistance. On the other hand, as our main purpose is to use POND as a conduit between this comprehensive youth pre-DM/DM database and interested researchers, our method can be adopted to longitudinal data sets should they become available in the future. Second, for the prediction of pre-DM/DM status, EI’s performance was found to be significantly better than the alternative approaches, including a modified form of the suggested guideline [ 45 ]. However, this performance assessment was based only on cross-validation, which is no substitute for validation on external data sets that is necessary for rigorous assessment. Finally, while our preliminary case study analyses identified a wide range of variables associated with youth prediabetes and diabetes, other known risk factors, such as current asthma status [ 80 - 82 ], added sugar consumption [ 83 - 85 ], sugary fruit and juice intake [ 83 - 86 ], and physical activity per week [ 6 - 8 , 50 ], were not identified. This limitation can be addressed by using other data analysis methods beyond our bivariate testing and ML approaches, highlighting more potential use cases of our data set.
Conclusions
Overall, the future impact of translational pre-DM/DM research facilitated by comprehensive data sets and web servers like ours extends beyond individual studies. It creates opportunities for interdisciplinary collaboration and reproducibility, strengthens evidence-based decision-making, and supports the development of targeted interventions for the prevention and management of pre-DM/DM among youth. By providing rich resources, our work can enable researchers to build upon existing knowledge and push the boundaries of translational pre-DM/DM research, ultimately leading to improved health outcomes for at-risk populations.
Acknowledgments
This study was enabled in part by computational resources provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai. The Ensemble Integration used in this work was implemented by Jamie JR Bennett. This work was funded by National Institutes of Health grants R21DK131555 and R01HG011407.
Data Availability
The data set and code used in this study are available at Zenodo [ 87 ] and our web portal POND [ 47 ].
Authors' Contributions
BL and GP contributed equally as cosenior and cosupervisory authors. NV, BL, and GP conceptualized the project. CM, YCL, NV, BL, and GP designed the methodology. CM and BL implemented the data curation and bivariate analyses. YCL implemented the ML case study and POND. CM and YCL conducted formal analysis and visualization. CM, YCL, NV, BL, and GP wrote the manuscript. NV, BL, and GP supervised the project.
Conflicts of Interest
None declared.
Supplemental materials.
- Temneanu OR, Trandafir LM, Purcarea MR. Type 2 diabetes mellitus in children and adolescents: a relatively new clinical problem within pediatric practice. J Med Life. 2016;9(3):235-239. [ FREE Full text ] [ Medline ]
- ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 2. Classification and diagnosis of diabetes: standards of care in diabetes-2023. Diabetes Care. 2023;46(Suppl 1):S19-S40. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Weiss R, Dufour S, Taksali SE, Tamborlane WV, Petersen KF, Bonadonna RC, et al. Prediabetes in obese youth: a syndrome of impaired glucose tolerance, severe insulin resistance, and altered myocellular and abdominal fat partitioning. Lancet. 2003;362(9388):951-957. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Zhang Y, Luk AOY, Chow E, Ko GTC, Chan MHM, Ng M, et al. High risk of conversion to diabetes in first-degree relatives of individuals with young-onset type 2 diabetes: a 12-year follow-up analysis. Diabet Med. 2017;34(12):1701-1709. [ CrossRef ] [ Medline ]
- Zhuang P, Liu X, Li Y, Wan X, Wu Y, Wu F, et al. Effect of diet quality and genetic predisposition on hemoglobin A and type 2 diabetes risk: gene-diet interaction analysis of 357,419 individuals. Diabetes Care. 2021;44(11):2470-2479. [ CrossRef ] [ Medline ]
- Pivovarov JA, Taplin CE, Riddell MC. Current perspectives on physical activity and exercise for youth with diabetes. Pediatr Diabetes. 2015;16(4):242-255. [ CrossRef ] [ Medline ]
- Colberg SR, Sigal RJ, Yardley JE, Riddell MC, Dunstan DW, Dempsey PC, et al. Physical activity/exercise and diabetes: a position statement of the American Diabetes Association. Diabetes Care. 2016;39(11):2065-2079. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Thomson NM, Kraft N, Atkins RC. Cell-mediated immunity in glomerulonephritis. Aust N Z J Med. 1981;11(Suppl 1):104-108. [ Medline ]
- Hill-Briggs F, Adler NE, Berkowitz SA, Chin MH, Gary-Webb TL, Navas-Acien A, et al. Social determinants of health and diabetes: a scientific review. Diabetes Care. 2020;44(1):258-279. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Butler AM. Social determinants of health and racial/ethnic disparities in type 2 diabetes in youth. Curr Diab Rep. 2017;17(8):60. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Walker RJ, Smalls BL, Campbell JA, Strom Williams JL, Egede LE. Impact of social determinants of health on outcomes for type 2 diabetes: a systematic review. Endocrine. 2014;47(1):29-48. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Bansal N. Prediabetes diagnosis and treatment: a review. World J Diabetes. 2015;6(2):296-303. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Liu J, Li Y, Zhang D, Yi SS, Liu J. Trends in prediabetes among youths in the US from 1999 through 2018. JAMA Pediatr. 2022;176(6):608-611. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Tönnies T, Brinks R, Isom S, Dabelea D, Divers J, Mayer-Davis EJ, et al. Projections of type 1 and type 2 diabetes burden in the US population aged 20 years through 2060: the SEARCH for Diabetes in Youth Study. Diabetes Care. Feb 1, 2023;46(2):313-320. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Lawrence JM, Divers J, Isom S, Saydah S, Imperatore G, Pihoker C, et al. SEARCH for Diabetes in Youth Study Group. Trends in prevalence of type 1 and type 2 diabetes in children and adolescents in the US, 2001-2017. JAMA. 2021;326(8):717-727. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Jensen ET, Dabelea D. Type 2 diabetes in youth: new lessons from the SEARCH Study. Curr Diab Rep. 2018;18(6):36. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Dabelea D, Mayer-Davis EJ, Saydah S, Imperatore G, Linder B, Divers J, et al. Prevalence of type 1 and type 2 diabetes among children and adolescents from 2001 to 2009. JAMA. 2014;311(17):1778-1786. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Andes LJ, Cheng YJ, Rolka DB, Gregg EW, Imperatore G. Prevalence of prediabetes among adolescents and young adults in the United States, 2005-2016. JAMA Pediatr. 2020;174(2):e194498. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Menke A, Casagrande S, Cowie CC. Prevalence of diabetes in adolescents aged 12 to 19 years in the United States, 2005-2014. JAMA. 2016;316(3):344-345. [ CrossRef ] [ Medline ]
- Khan MAB, Hashim MJ, King JK, Govender RD, Mustafa H, Al Kaabi J. Epidemiology of type 2 diabetes—global burden of disease and forecasted trends. J Epidemiol Glob Health. 2020;10(1):107-111. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Lin X, Xu Y, Pan X, Xu J, Ding Y, Sun X, et al. Global, regional, and national burden and trend of diabetes in 195 countries and territories: an analysis from 1990 to 2025. Sci Rep. 2020;10(1):14790. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Imperatore G, Boyle JP, Thompson TJ, Case D, Dabelea D, Hamman RF, et al. Projections of type 1 and type 2 diabetes burden in the U.S. population aged 20 years through 2050: dynamic modeling of incidence, mortality, and population growth. Diabetes Care. Dec 2012;35(12):2515-2520. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Herman WH, Ma Y, Uwaifo G, Haffner S, Kahn SE, Horton ES, et al. Diabetes Prevention Program Research Group. Differences in A1C by race and ethnicity among patients with impaired glucose tolerance in the Diabetes Prevention Program. Diabetes Care. 2007;30(10):2453-2457. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Kahkoska AR, Shay CM, Crandell J, Dabelea D, Imperatore G, Lawrence JM, et al. Association of race and ethnicity with glycemic control and hemoglobin A levels in youth with type 1 diabetes. JAMA Netw Open. 2018;1(5):e181851. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Lascar N, Brown J, Pattison H, Barnett AH, Bailey CJ, Bellary S. Type 2 diabetes in adolescents and young adults. Lancet Diabetes Endocrinol. 2018;6(1):69-80. [ CrossRef ] [ Medline ]
- Lee AM, Fermin CR, Filipp SL, Gurka MJ, DeBoer MD. Examining trends in prediabetes and its relationship with the metabolic syndrome in US adolescents, 1999-2014. Acta Diabetol. 2017;54(4):373-381. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Weiss R, Taksali SE, Tamborlane WV, Burgert TS, Savoye M, Caprio S. Predictors of changes in glucose tolerance status in obese youth. Diabetes Care. 2005;28(4):902-909. [ CrossRef ] [ Medline ]
- Nadeau K, Anderson B, Berg E, Chiang J, Chou H, Copeland K, et al. Youth-onset type 2 diabetes consensus report: current status, challenges, and priorities. Diabetes Care. 2016;39(9):1635-1642. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Dart A, Martens P, Rigatto C, Brownell M, Dean H, Sellers E. Earlier onset of complications in youth with type 2 diabetes. Diabetes Care. 2014;37(2):436-443. [ CrossRef ] [ Medline ]
- American Diabetes Association. Economic costs of diabetes in the U.S. in 2017. Diabetes Care. 2018;41(5):917-928. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Al-Goblan AS, Al-Alfi MA, Khan MZ. Mechanism linking diabetes mellitus and obesity. Diabetes Metab Syndr Obes. 2014;7:587-591. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Chan JCN, Lim L, Wareham NJ, Shaw JE, Orchard TJ, Zhang P, et al. The Lancet Commission on diabetes: using data to transform diabetes care and patient lives. Lancet. 2021;396(10267):2019-2082. [ CrossRef ] [ Medline ]
- IDF Diabetes Atlas, 10th Edition. International Diabetes Federation. URL: https://diabetesatlas.org/ [accessed 2024-05-16]
- U.S. Chronic Disease Indicators: Diabetes | Chronic Disease and Health Promotion Data & Indicators. URL: https://chronicdata.cdc.gov/Chronic-Disease-Indicators/U-S-Chronic-Disease-Indicators-Diabetes/f8ti-h92k [accessed 2023-05-17]
- Homepage of NCD Risk Factor Collaboration. NCD Risk Factor Collaboration. URL: https://ncdrisc.org/index.html [accessed 2023-05-17]
- NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants. Lancet. 2016;387(10027):1513-1530. [ FREE Full text ] [ CrossRef ] [ Medline ]
- UCI Machine Learning Repository. Diabetes 130-US hospitals for years 1999-2008 Data Set. URL: https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008 [accessed 2023-05-20]
- Type 2 Diabetes Knowledge Portal. URL: https://t2d.hugeamp.org/ [accessed 2023-05-17]
- Rashid A. Diabetes Dataset. Mendeley Data. German. Elsevier; Jul 18, 2020. URL: https://data.mendeley.com/datasets/wj9rwkp9c2/1 [accessed 2024-05-16]
- Diabetes Dataset 2019. URL: https://www.kaggle.com/datasets/tigganeha4/diabetes-dataset-2019 [accessed 2023-05-20]
- Diabetes Health Indicators Dataset. URL: https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset [accessed 2023-05-17]
- Vangeepuram N, Liu B, Chiu P, Wang L, Pandey G. Predicting youth diabetes risk using NHANES data and machine learning. Sci Rep. 2021;11(1):11212. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Nagarajan S, Khokhar A, Holmes DS, Chandwani S. Family consumer behaviors, adolescent prediabetes and diabetes in the national health and nutrition examination survey (2007-2010). J Am Coll Nutr. 2017;36(7):520-527. [ CrossRef ] [ Medline ]
- Wallace AS, Wang D, Shin J, Selvin E. Screening and diagnosis of prediabetes and diabetes in US children and adolescents. Pediatrics. 2020;146(3):e20200265. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Chu P, Patel A, Helgeson V, Goldschmidt AB, Ray MK, Vajravelu ME. Perception and awareness of diabetes risk and reported risk-reducing behaviors in adolescents. JAMA Netw Open. 2023;6(5):e2311466. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Patel CJ, Pho N, McDuffie M, Easton-Marks J, Kothari C, Kohane IS, et al. A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data. 2016;3:160096. [ FREE Full text ] [ CrossRef ] [ Medline ]
- PreDM/DM in youth ONline Dashboard (POND). URL: https://rstudio-connect.hpc.mssm.edu/POND/ [accessed 2024-02-02]
- Freepik. URL: https://www.flaticon.com [accessed 2024-05-31]
- Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J. National health and nutrition examination survey: plan and operations, 1999-2010. Vital Health Stat 1. 2013;(56):1-37. [ FREE Full text ] [ Medline ]
- Sampath Kumar A, Maiya AG, Shastry BA, Vaishali K, Ravishankar N, Hazari A, et al. Exercise and insulin resistance in type 2 diabetes mellitus: a systematic review and meta-analysis. Ann Phys Rehabil Med. 2019;62(2):98-103. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Karstoft K, Winding K, Knudsen SH, Nielsen JS, Thomsen C, Pedersen BK, et al. The effects of free-living interval-walking training on glycemic control, body composition, and physical fitness in type 2 diabetic patients: a randomized, controlled trial. Diabetes Care. 2013;36(2):228-236. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Karstoft K, Christensen CS, Pedersen BK, Solomon TPJ. The acute effects of interval- vs continuous-walking exercise on glycemic control in subjects with type 2 diabetes: a crossover, controlled study. J Clin Endocrinol Metab. 2014;99(9):3334-3342. [ CrossRef ] [ Medline ]
- Li Y, Wang L, Law J, Murali T, Pandey G. Integrating multimodal data through interpretable heterogeneous ensembles. Bioinform Adv. 2022;2(1):vbac065. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Bennett JJR, Li YC, Pandey G. An open-source Python package for multi-modal data integration using heterogeneous ensembles. arXiv. Preprint posted online on January 17, 2024. 2024. [ FREE Full text ] [ CrossRef ]
- Arslanian S, Bacha F, Grey M, Marcus M, White N, Zeitler P. Evaluation and management of youth-onset type 2 diabetes: a position statement by the American Diabetes Association. Diabetes Care. 2018;41(12):2648-2668. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Centers for Disease Control and Prevention. The SAS Program for CDC Growth Charts. SAS Program. URL: https://www.cdc.gov/nccdphp/dnpao/growthcharts/resources/sas.htm [accessed 2024-05-20]
- BernardRosner. Childhood blood pressure macro-batch mode. URL: https://sites.google.com/a/channing.harvard.edu/bernardrosner/pediatric-blood-press/childhood-blood-pressure [accessed 2023-05-19]
- United States Department of Agriculture (USDA). Food consumption and nutrient intake. URL: https://www.ers.usda.gov/data-products/food-consumption-and-nutrient-intakes/ [accessed 2024-05-20]
- Caruana R, Niculescu-Mizil A, Crew G, Ksikes A. Ensemble selection from libraries of models. In: Machine Learning. ACM International Conference Proceeding Series; 2004. Presented at: Proceedings of the Twenty-first International Conference (ICML 2004); July 4-8 2004; Banff, Alberta, Canada.
- Caruana R, Munson A, Niculescu-Mizil A. Getting the most out of ensemble selection. In: Machine Learning. IEEE Computer Society; 2006. Presented at: Proceedings of the 6th {IEEE} International Conference on Data Mining; March 24 2023:828-833; Hong Kong, China. URL: https://www.researchgate.net/publication/220766367_Getting_the_Most_Out_of_Ensemble_Selection
- Brodersen KH, Ong CS, Stephan KE, Buhmann JM. The balanced accuracy and its posterior distribution. In: Machine Learning. IEEE; 2004. Presented at: 2010 20th International Conference on Pattern Recognition; August 23-26 2010; Istanbul, Turkey. URL: https://ieeexplore.ieee.org/document/5597285/authors#authors
- Benjamini Y, Hochberg Y. Controlling the false discovery rate. a practical and powerful approach to multiple testing. 1995;57(1):289-300. [ FREE Full text ]
- R Markdown Format for flexible dashboards. URL: https://pkgs.rstudio.com/flexdashboard/ [accessed 2023-05-18]
- Shiny. Welcome to shiny. URL: https://shiny.posit.co/r/getstarted/shiny-basics/lesson1/index.html [accessed 2023-05-18]
- Kwak SG, Kim JH. Central limit theorem: the cornerstone of modern statistics. Korean J Anesthesiol. 2017;70(2):144-156. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Tomczak M, Tomczak E. The need to report effect size estimates revisited. an overview of some recommended measures of effect size. Trends Sport Sci. Feb 15, 2014;1(21):19-25.
- Herman WH, Smith PJ, Thompson TJ, Engelgau MM, Aubert RE. A new and simple questionnaire to identify people at increased risk for undiagnosed diabetes. Diabetes Care. 1995;18(3):382-387. [ CrossRef ] [ Medline ]
- Bang H, Edwards AM, Bomback AS, Ballantyne CM, Brillon D, Callahan MA, et al. Development and validation of a patient self-assessment score for diabetes risk. Ann Intern Med. 2009;151(11):775-783. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Poltavskiy E, Kim DJ, Bang H. Comparison of screening scores for diabetes and prediabetes. Diabetes Res Clin Pract. 2016;118:146-153. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Whalen S, Pandey OP, Pandey G. Predicting protein function and other biomedical characteristics with heterogeneous ensembles. Methods. 2016;93:92-102. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proc 22nd ACM SIGKDD Int Conf Knowl Discov Data Min New York. USA. Association for Computing Machinery; 2016. Presented at: Association for Computing Machinery; 2016; NY. URL: https://dl.acm.org/doi/10.1145/2939672.2939785 [ CrossRef ]
- Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need. Information Fusion. May 2022;81:84-90. [ FREE Full text ] [ CrossRef ]
- Goyal K, Dumancic S, Blockeel H. Feature interactions in XGBoost. arXiv. Preprint posted online on July 11, 2020. 2020. [ FREE Full text ] [ CrossRef ]
- Feature Interaction Constraints. XGBoost 2.0.3 documentation. URL: https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html [accessed 2024-02-01]
- Sesmero MP, Ledezma AI, Sanchis A. Generating ensembles of heterogeneous classifiers using stacked generalization. WIREs Data Min & Knowl. 2015;5(1):21-34. [ CrossRef ]
- Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011;39(Web Server issue):W541-W545. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Bhattacharya S, Andorf S, Gomes L, Dunn P, Schaefer H, Pontius J, et al. ImmPort: disseminating data to the public for the future of immunology. Immunol Res. 2014;58(2-3):234-239. [ CrossRef ] [ Medline ]
- Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, et al. International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data. Database (Oxford). 2011;2011:bar026. [ FREE Full text ] [ CrossRef ] [ Medline ]
- NHANES - NCHS Research Ethics Review Board Approval. 2022. URL: https://www.cdc.gov/nchs/nhanes/irba98.htm [accessed 2024-01-19]
- Rayner L, McGovern A, Creagh-Brown B, Woodmansey C, de Lusignan S. Type 2 diabetes and asthma: systematic review of the bidirectional relationship. Curr Diabetes Rev. 2019;15(2):118-126. [ CrossRef ] [ Medline ]
- Black MH, Anderson A, Bell RA, Dabelea D, Pihoker C, Saydah S, et al. Prevalence of asthma and its association with glycemic control among youth with diabetes. Pediatrics. 2011;128(4):e839-e847. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Wu TD. Diabetes, insulin resistance, and asthma: a review of potential links. Curr Opin Pulm Med. 2021;27(1):29-36. [ CrossRef ] [ Medline ]
- Vartanian LR, Schwartz MB, Brownell KD. Effects of soft drink consumption on nutrition and health: a systematic review and meta-analysis. Am J Public Health. 2007;97(4):667-675. [ CrossRef ] [ Medline ]
- Greenwood DC, Threapleton DE, Evans CEL, Cleghorn CL, Nykjaer C, Woodhead C, et al. Association between sugar-sweetened and artificially sweetened soft drinks and type 2 diabetes: systematic review and dose-response meta-analysis of prospective studies. Br J Nutr. 2014;112(5):725-834. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Malik VS, Popkin BM, Bray GA, Després JP, Willett WC, Hu FB. Sugar-sweetened beverages and risk of metabolic syndrome and type 2 diabetes: a meta-analysis. Diabetes Care. 2010;33(11):2477-2483. [ FREE Full text ] [ CrossRef ] [ Medline ]
- Muraki I, Imamura F, Manson JE, Hu FB, Willett WC, van Dam RM, et al. Fruit consumption and risk of type 2 diabetes: results from three prospective longitudinal cohort studies. BMJ. 2013;347:f5001. [ FREE Full text ] [ CrossRef ] [ Medline ]
- McDonough C, Li Y. Youth preDM/DM dataset and Case Studies. Switzerland. Zenodo; 2024. URL: https://zenodo.org/records/10531245 [accessed 2024-05-29]
Abbreviations
American Diabetes Association |
area under the receiver operating characteristic curve |
balanced accuracy |
diabetes mellitus |
Ensemble Integration |
false discovery rate |
fasting plasma glucose |
glycated hemoglobin |
machine learning |
National Health and Nutrition Examination Survey |
Prediabetes/diabetes in youth Online Dashboard |
pre–diabetes |
social determinants of health |
extreme gradient boosting |
Edited by A Mavragani, T Sanchez; submitted 05.10.23; peer-reviewed by S El Khamlichi, C Zhao, Y Su; comments to author 09.01.24; revised version received 06.02.24; accepted 26.04.24; published 02.07.24.
©Catherine McDonough, Yan Chak Li, Nita Vangeepuram, Bian Liu, Gaurav Pandey. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 02.07.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
![machine learning case study on yelp ACM Digital Library home](https://dl.acm.org/specs/products/acm/releasedAssets/images/acm-dl-logo-white-1ecfb82271e5612e8ca12aa1b1737479.png)
Implementation of Zero Defect Manufacturing using quality prediction: : a spot welding case study from Bosch
New citation alert added.
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
New Citation Alert!
Please log in to your account
Information & Contributors
Bibliometrics & citations, view options, recommendations, a practical guide for implementing zero defect manufacturing in new or existing manufacturing systems.
The approach to achieving zero defects by using Industry 4.0 technologies is what constitutes Zero Defect Manufacturing (ZDM). However, its implementation is not a simple task since it requires careful design and new methods. The current ...
Zero Defect Manufacturing ontology: A preliminary version based on standardized terms
The global transition from traditional manufacturing systems to Industry 4.0 compatible systems has already begun. Therefore, the digitization of the manufacturing systems across the globe is increasing with exponential growth which ...
- Develop an initial ontology for the Zero Defect Manufacturing (ZDM) domain.
Optimizing efficiency and zero-defect manufacturing with in-process inspection: challenges, benefits, and aerospace application
In this paper, we present a comprehensive study on the implementation of machine vision-enabled in-process quality inspection systems in machining operations. Our objective is to enable zero-defect manufacturing by maximizing efficiency and ...
Information
Published in.
Elsevier Science Publishers B. V.
Netherlands
Publication History
Author tags.
- Zero Defect Manufacturing
- Machine learning
- Quality prediction
- Simulation Industry 4.0
- Research-article
Contributors
Other metrics, bibliometrics, article metrics.
- 0 Total Citations
- 0 Total Downloads
- Downloads (Last 12 months) 0
- Downloads (Last 6 weeks) 0
View options
Login options.
Check if you have access through your login credentials or your institution to get full access on this article.
Full Access
Share this publication link.
Copying failed.
Share on social media
Affiliations, export citations.
- Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
- Download citation
- Copy citation
We are preparing your search results for download ...
We will inform you here when the file is ready.
Your file of search results citations is now ready.
Your search export query has expired. Please try again.
Help | Advanced Search
Computer Science > Machine Learning
Title: a case study on contextual machine translation in a professional scenario of subtitling.
Abstract: Incorporating extra-textual context such as film metadata into the machine translation (MT) pipeline can enhance translation quality, as indicated by automatic evaluation in recent work. However, the positive impact of such systems in industry remains unproven. We report on an industrial case study carried out to investigate the benefit of MT in a professional scenario of translating TV subtitles with a focus on how leveraging extra-textual context impacts post-editing. We found that post-editors marked significantly fewer context-related errors when correcting the outputs of MTCue, the context-aware model, as opposed to non-contextual models. We also present the results of a survey of the employed post-editors, which highlights contextual inadequacy as a significant gap consistently observed in MT. Our findings strengthen the motivation for further work within fully contextual MT.
Comments: | Accepted to EAMT 2024 |
Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC) |
Cite as: | [cs.LG] |
| (or [cs.LG] for this version) |
| Focus to learn more arXiv-issued DOI via DataCite |
Submission history
Access paper:.
- HTML (experimental)
- Other Formats
![machine learning case study on yelp license icon](https://arxiv.org/icons/licenses/by-nc-sa-4.0.png)
References & Citations
- Google Scholar
- Semantic Scholar
BibTeX formatted citation
![machine learning case study on yelp BibSonomy logo](https://arxiv.org/static/browse/0.3.4/images/icons/social/bibsonomy.png)
Bibliographic and Citation Tools
Code, data and media associated with this article, recommenders and search tools.
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
![](//academicpaper.online/777/templates/cheerup/res/banner1.gif)
IMAGES
VIDEO
COMMENTS
MLeap is a serialization format and execution engine, and provides two advantages for our ML Platform. Firstly, MLeap comes out of the box with support for Yelp's most commonly used ML libraries: Spark, XGBoost, Scikit-learn, and Tensorflow - and additionally can be extended for custom transformers to support edge cases.
Yelp's website, Yelp.com, is a crowd-sourced local business review site. Their business model relies on relevant reviews (on scale of 1-5 stars) which generates advertising revenue. 1 The content's search-ability is very important for businesses, an HBS study found that each "star" in a Yelp rating affected the business owner's sales by 5-9%. 2 Machine learning has been integral to ...
This dataset contains labeled customer reviews from Yelp (Deceptive reviews and True reviews), which is used for training our predictive models on Fake Review machine learning approach. Source: It ...
Sentiment Analysis: Predicting Yelp Scores. Bhanu Prakash Reddy Guda, Mashrin Srivastava, Deep Karkhanis. In this work, we predict the sentiment of restaurant reviews based on a subset of the Yelp Open Dataset. We utilize the meta features and text available in the dataset and evaluate several machine learning and state-of-the-art deep learning ...
Speakers:Ryan Irwin, Engineering Manager, Yelp Inc.Ryan Irwin is a senior engineering manager at Yelp. He leads the teams responsible for the ML Platform, wh...
Word Clouds of Negative (Left) and Positive (Right) Reviews, Image by author Natural Language Processing. Leveraging on the text reviews left by each Yelp User, I was able to create new rating scores for each User by incorporating sentiment analysis of their text reviews.Sentiment analysis is the process of determining the attitude or emotion of the user (whether it is positive or negative or ...
Yelp, founded in 2004, is a multinational corporation that publishes crowd-sourced online reviews on local businesses. As of 2014, Yelp.com had 57 million reviews and 132 million monthly visitors [1]. A portion of their large dataset is avail-able on the Yelp Dataset Challenge homepage, which includes
Yelp Open Dataset. We utilize the meta features and text available in the dataset and evaluate several machine learning and state-of-the-art deep learning approaches for the prediction task. Through several qualitative experiments, we show the success of the deep models with attention mechanism in learning a balanced model for
Abstract. The paper attempts to document the application of relevant Machine Learning (ML) models on Yelp (a crowd-sourced local business review and social networking site) dataset to analyze ...
Sentiment Analysis: A Systematic Case Study with Yelp Scores. This article experiments with various existing machine learning algorithms, from easy logistic regression to BERT embedding-based deep models, and uses ensemble to combine the aforementioned models into a single predictor, seeing if a combination of these models will achieve better ...
Introduction. The Yelp Dataset Challenge makes a huge set of user, business, and review data publicly available for machine learning projects. They wish to find interesting trends and patterns in all of the data they have accumulated. Our goal is to predict how useful a review will prove to be to users. We can use review upvotes as a metric.
Reviews Using Di erent Machine Learning Algorithms: A Case Study of Yelp Yi Luo 1 and Xiaowei Xu 2,* 1 College of Business Administration, Capital University of Economics and Business, Beijing 100070, China; [email protected] 2 School of Business Administration, Southwestern University of Finance and Economics, Chengdu 611130, China
This study empirically analyzed online restaurant reviews from Yelp in the era of the COVID-19 pandemic using traditional machine learning methods as well as deep learning methods. Based on the number of restaurant reviews posted on Yelp, an observable decline in March and a sharp decline in April were consistent with the timeline of how the ...
This value is approximated by the date of the first yelp review. This means that restaurants that joined yelp late or do not receive frequent comments would appear to have a relatively younger age than their real value. Also, the restaurant age is limited by the date Yelp was founded (i.e. 2004). 4. Machine learning models and optimization
We predict restaurant ratings from Yelp reviews based on Yelp Open Dataset. Data distribution is presented, and one balanced training dataset is built. Two vectorizers are experimented for feature engineering. Four machine learning models including Naive Bayes, Logistic Regression, Random Forest, and Linear Support Vector Machine are implemented.
Machine Learning Case Study on Yelp. As far as our technical knowledge is concerned, we are not able to recognize Yelp as a tech company. However, it is effectively taking advantage of machine learning to improve its users' experience to a great extent. ... Machine Learning Case Studies in Life Science and Biology 7. Development of Microbiome ...
We predict restaurant ratings from Yelp reviews based on Yelp Open Dataset. Data distribution is presented, and one balanced training dataset is built. Two vectorizers are experimented for feature engineering. Four machine learning models including Naive Bayes, Logistic Regression, Random Forest, and Linear Support Vector Machine are implemented. Four transformer-based models containing BERT ...
Abstract—We use over 350,000 Yelp reviews on 5,000 restau-rants to perform an ablation study on text preprocessing tech-niques. We also compare the effectiveness of several machine learning and deep learning models on predicting user sentiment (negative, neutral, or positive). For machine learning models,
Various Machine Learning Case Studies. Contribute to kkaushi4/Machine-Learning-Case-Studies development by creating an account on GitHub.
Sentiment Analysis is a part of NLP application that extracts emotional information from texts. In this study, we investigate the performance of sequence-based model, i.e., LSTM, compared with ...
Here are the five best machine learning case studies explained: 1. Machine Learning Case Study on Dell. The multinational leader in technology, Dell, empowers people and communities from across the globe with superior software and hardware. Since data is a core part of Dell's hard drive, their marketing team needed a data-driven solution that ...
Detection results and case studies. We started testing v8 in shadow mode in March 2024. Every hour, v8 is classifying more than 17 million unique IPs that participate in residential proxy attacks. Figure 4 shows the geographic distribution of IPs with residential proxy activity belonging to more than 45 thousand ASNs in 237 countries/regions.
Helpful online reviews could be utilized to create sustainable marketing strategies in the restaurant industry, which contributes to national sustainable economic development. This study, the main aspects (including food/taste, experience, location, and value) from 294,034 reviews on Yelp.com were extracted empirically using the Latent Dirichlet Allocation (LDA) and positive and negative ...
Background: The prevalence of type 2 diabetes mellitus (DM) and pre-diabetes mellitus (pre-DM) has been increasing among youth in recent decades in the United States, prompting an urgent need for understanding and identifying their associated risk factors. Such efforts, however, have been hindered by the lack of easily accessible youth pre-DM/DM data.
B Zhou, Y Svetashova, S Byeon, T Pychynski, R Mikut, E. Kharlamov, Predicting Quality of Automated Welding with Machine Learning and Semantics: A Bosch Case Study, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Association for Computing Machinery, New York, NY, USA, 2020, pp. 2933-2940,.
Review websites, such as TripAdvisor and Yelp, allow users to post online reviews for various businesses, products and services, and have been recently shown to have a significant influence on consumer shopping behaviour. An online review typically consists of free-form text and a star rating out of 5. The problem of predicting a user's star rating for a product, given the user's text review ...
Incorporating extra-textual context such as film metadata into the machine translation (MT) pipeline can enhance translation quality, as indicated by automatic evaluation in recent work. However, the positive impact of such systems in industry remains unproven. We report on an industrial case study carried out to investigate the benefit of MT in a professional scenario of translating TV ...