Title |
Speaker |
Schedule |
Materials |
Bayesian Calibration and Uncertainty Analysis: A Case Study Using a 2-D CFD Turbulence Model | Peter Chien | Wednesday, March 21 1:15 PM |
Materials TBD![]() |
Show Abstract The growing use of simulations in the engineering design process promises to reduce the need for extensive physical testing, decreasing both development time and cost. However, as mathematician and statistician George E. P. Box said, “Essentially, all models are wrong, but some are useful.” There are many factors that determine simulation or, more broadly, model accuracy. These factors can be condensed into noise, bias, parameter uncertainty, and model form uncertainty. To counter these effects and ensure that models faithfully match reality to the extent required, simulation models must be calibrated to physical measurements. Further, the models must be validated, and their accuracy must be quantified before they can be relied on in lieu of physical testing. Bayesian calibration provides a solution for both requirements: it optimizes tuning of model parameters to improve simulation accuracy, and estimates any remaining discrepancy which is useful for model diagnosis and validation. Also, because model discrepancy is assumed to exist in this framework, it enables robust calibration even for inaccurate models. In this paper, we present a case study to investigate the potential benefits of using Bayesian calibration, sensitivity analyses, and Monte Carlo analyses for model improvement and validation. We will calibrate a 7-parameter k-𝜎 CFD turbulence model simulated in COMSOL Multiphysics®. The model predicts coefficient of lift and drag for an airfoil defined using a 6049-series airfoil parameterization from the National Advisory Committee for Aeronautics (NACA). We will calibrate model predictions using publicly available wind tunnel data from the University of Illinois Urbana-Champaign’s (UIUC) database. Bayesian model calibration requires intensive sampling of the simulation model to determine the most likely distribution of calibration parameters, which can be a large computational burden. We greatly reduce this burden by following a surrogate modeling approach, using Gaussian process emulators to mimic the CFD simulation. We train the emulator by sampling the simulation space using a Latin Hypercube (LHD) Design of Experiment (DOE), and assess the accuracy of the emulator using leave-oneout Cross Validation (CV) error. The Bayesian calibration framework involves calculating the discrepancy between simulation results and physical test results. We also use Gaussian process emulators to model this discrepancy. The discrepancy emulator will be used as a tool for model validation; characteristic trends in residual errors after calibration can indicate underlying model form errors which were not addressed via tuning the model calibration parameters. In this way, we will separate and quantify model form uncertainty and parameter uncertainty. The results of a Bayesian calibration include a posterior distribution of calibration parameter values. These distributions will be sampled using Monte Carlo methods to generate model predictions, whereby new predictions have a distribution of values which reflects the uncertainty in the tuned calibrated parameter. The resulting output distributions will be compared against physical data and the uncalibrated model to assess the effects of the calibration and discrepancy model. We will also perform global, variance based sensitivity analysis on the uncalibrated model and the calibrated models, and investigate any changes in the sensitivity indices from uncalibrated to calibrated. |
|||
Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study | Christopher Drake | Wednesday, March 21 1:15 PM |
![]() |
Show Abstract In today’s manufacturing, inspection, and testing world, understanding the capability of the measurement system being used via the use of Measurement Systems Analyses (MSA) is a crucial activity that provides the foundation for the use of Design of Experiments (DOE) and Statistical Process Control (SPC). Although undesirable, there are times when human observation is the only measurement system available. In these types of situations, traditional MSA tools are often ineffectual due to the nature of the data collected. When there are no other alternatives, we need some method for assessing the adequacy and effectiveness of the human observations. When multiple observers are involved, Attribute Agreement Analyses are a powerful tool for quantifying the Agreement and Effectiveness of a visual inspection system. This talk will outline best practices and rules of thumb for Attribute Agreement Analyses, and will highlight a recent Army case study to further demonstrate the tool’s use and potential. |
|||
Leveraging Anomaly Detection for Aircraft System Health Data Stability Reporting | Kyle Gartrell |
Wednesday, March 21 |
![]() |
Show Abstract Detecting and diagnosing aircraft system health poses a unique challenge as system complexity increases and software is further integrated. Anomaly detection algorithms systematically highlight unusual patterns in large datasets and are a promising methodology for detecting aircraft system health. The F-35A fighter aircraft is driven by complex, integrated subsystems with both software and hardware components. The F-35A operational flight program is the software that manages each subsystem within the aircraft and the flow of required information and support between subsystems. This information and support are critical to the successful operation of many subsystems. For example, the radar system supplies information to the fusion engine, without which the fusion engine would fail. ACC operational testing can be thought of as equivalent to beta testing for operational flight programs. As in other software, many faults result in minimal loss of functionality and are often unnoticed by the user. However, there are times when a software fault might result in catastrophic functionality loss (i.e., subsystem shutdown). It is critical to catch software problems that will result in catastrophic functionality loss before the flight software is fielded to the combat air forces. Subsystem failures and degradations can be categorized and quantified using simple system health data codes (e.g., degrade, fail, healthy). However, because the integrated nature of the F-35A, a subsystem degradation may be caused by another subsystem. The 59th Test and Evaluation Squadron collects autonomous system data, pilot questionnaires, and health report codes for F-35A subsystems. Originally, this information was analyzed using spreadsheet tools (i.e., Microsoft Excel). Using this method, analysts were unable to examine all subsystems or attribute cause for subsystem faults. The 59 TES is developing a new process that leverages anomaly detection algorithms to isolate flights with unusual patterns of subsystem failures and within those flights, highlight what subsystem faults are correlated with increased subsystem failures. This presentation will compare the performance of several anomaly detection algorithms (e.g., K-means, K-nearest neighbors, support vector machines) using simulated F-35A data. |
|||
Application of Adaptive Sampling to Advance the Metamodeling and Uncertainty Quantification Process | Erik Axdahl and Robert A. Baurle |
Wednesday, March 21 1:15 PM |
![]() |
Show Abstract Over the years the aerospace industry has continued to implement design of experiments and metamodeling (e.g., response surface methodology) in order to shift the knowledge curve forward in the systems design process. While the adoption of these methods is still incomplete across aerospace sub-disciplines, they comprise the state-of-the-art during systems design and for design evaluation using modeling and simulation or ground testing. In the context of modeling and simulation, while national infrastructure in high performance computing becomes higher performance, so do the demands placed on those resources in terms of simulation fidelity and number of researchers. Furthermore, with recent emphasis placed on the uncertainty quantification of aerospace system design performance, the number of simulation cases needed to properly characterize a system’s uncertainty across the entire design space increases by orders of magnitude, further stressing available resources. This leads to advanced development groups either sticking to ad hoc estimates of uncertainty (e.g., subject matter expert estimates based on experience) or neglecting uncertainty quantification all together. Advancing the state-of-the-art of aerospace systems design and evaluation requires a practical adaptive sampling scheme that responds to the characteristics of the underlying design or uncertainty space. For example, when refining a system metamodel gradually, points should be chosen for design variable combinations that are located in high curvature regions or where metamodel uncertainty is the greatest. The latter method can be implemented by defining a functional form of the metamodel variance and using it to define the next best point to sample. For schemes that require n points to be sampled simultaneously, considerations can be made to ensure proper sample dispersion. The implementation of adaptive sampling schemes to the design and evaluation process will enable similar fidelity with fewer samples of the design space compared to fixed or ad hoc sampling methods (i.e., shorter time or human resources required). Alternatively, the uncertainty of the design space can be reduced to a greater extent for the same number of samples or with fewer samples using higher fidelity simulations. The purpose of this presentation will be to examine the benefits of adaptive sampling as applied to challenging design problems. Emphasis will be placed on methods that are accessible to engineering practitioners who are not experts in data science, metamodeling, or uncertainty quantification in order to foster adoption within their communities of practice. |
|||
Application of Statistical Methods and Designed Experiments to Development of Technical Requirements | Eli Golden | Wednesday, March 21 1:15 PM |
![]() |
Show Abstract The Army relies heavily on the voice of the customer to develop and refine technical requirements for developmental systems, but too often the approach is reactive. The ARDEC (Armament Research, Development & Engineering Center) Statistics Group at Picatinny Arsenal, NJ, working closely with subject matter experts, has been implementing market research and web development techniques and Design of Experiments (DOE) best practices to design and analyze surveys that provide insight into the customer’s perception of utility for various developmental commodities. Quality organizations tend to focus on ensuring products meet technical requirements, with far less of an emphasis placed on whether or not the specification actually captures customer needs. The employment of techniques and best practices spanning the fields of Market Research, Design of Experiments, and Web Development (choice design, conjoint analysis, contingency analysis, psychometric response scales, stratified random sampling) converge towards a more proactive and risk-mitigating approach to the development of technical and training requirements, and encourages strategic decision-making when faced with the inarticulate nature of human preference. Establishing a hierarchy of customer preference for objective and threshold values of key performance parameters enriches the development process of emerging systems by making the process simultaneously more effective and more efficient. |
|||
Infrastructure Lifetimes | Erika Taketa and William Romine |
Wednesday, March 21 1:15 PM |
![]() |
Show Abstract Infrastructure refers to the structures, utilities, and interconnected roadways that support the work carried out at a given facility. In the case of the Lawrence Livermore National Laboratory infrastructure is considered exclusive of scientific apparatus, safety and security systems. LLNL inherited it’s infrastructure management policy from the University of California which managed the site during LLNL’s first 5 decades. This policy is quite different from that used in commercial property management. Commercial practice weighs reliability over cost by replacing infrastructure at industry standard lifetimes. LLNL practice weighs overall lifecycle cost seeking to mitigate reliability issues through inspection. To formalize this risk management policy a careful statistical study was undertaken using 20 years of infrastructure replacement data. In this study care was taken to adjust for left truncation as-well-as right censoring. 57 distinct infrastructure class data sets were fitted using MLE to the Generalized Gamma distribution. This distribution is useful because it produces a weighted blending of discrete failure (Weibull model) and complex system failure (Lognormal model). These parametric fittings then yielded median lifetimes and conditional probabilities of failure. From conditional probabilities bounds on budget costs could be computed as expected values. This has provided a scientific basis for rational budget management as-well-as aided operations by prioritizing inspection, repair and replacement activities. |
|||
Workforce Analytics | William Romine | Wednesday, March 21 3:00 PM |
Materials TBD![]() |
Show Abstract Several statistical methods have been used effectively to model workforce behavior, specifically attrition due to retirement and voluntary separation[1]. Additionally various authors have introduced career development[2] as a meaningful aspect of workforce planning. While both general and more specific attrition modeling techniques yield useful results only limited success has followed attempts to quantify career stage transition probabilities. A complete workforce model would include quantifiable flows both vertically and horizontally in the network described pictorially here at a single time point in Figure 1. The horizontal labels in Figure 1 convey one possible meaning assignable to career stage transition – in this case, competency. More formal examples might include rank within a hierarchy such as in a military organization or grade in a civil service workforce. In the case of the Nuclear Weapons labs knowing that the specialized, classified knowledge needed to deal with Stockpile Stewardship is being preserved as evidenced by the production of Masters, individuals capable of independent technical work, is also of interest to governmental oversight. In this paper we examine the allocation of labor involved in a specific Life Extension program at LLNL. This growing workforce is described by discipline and career stage to determine how well the Norden-Rayleigh development cost model[3] fits the data. Since this model underlies much budget estimation within both DOD and NNSA the results should be of general interest. Data is also examined as a possible basis for quantifying horizontal flows in Figure 1. |
|||
Optimizing for Mission Success in Highly Uncertain Scenarios | Brian Chell | Wednesday, March 21 3:00 PM |
Materials TBD![]() |
Show Abstract Optimization under uncertainty increases the complexity of a problem as well as the computing resources required to solve it. As the amount of uncertainty is increased, these difficulties are exacerbated. However, when optimizing for mission-level objectives, rather than component- or system-level objectives, an increase in uncertainty is inevitable. Previous research has found methods to perform optimization under uncertainty, such as robust design optimization or reliability-based design optimization. These are generally executed at a product component quality level, to minimize variability and stay within design tolerances but are not tailored to capture the high amount of variability in a mission-level problem. . In this presentation, an approach for formulating and solving highly stochastic mission-level optimization problems is described. A case study is shown using an unmanned aerial system (UAS) on a search mission while an “enemy” UAS attempts to interfere. This simulation, modeled in the Unity Game Engine, has highly stochastic outputs, where the time to mission success varies by multiple orders of magnitude, but the ultimate goal is a binary output representing mission success or failure. The results demonstrate the capabilities and challenges of optimization in these types of mission scenarios. |
|||
Test Planning for Observational Studies using Poisson Process Modeling | Brian Stone | Wednesday, March 21 3:00 PM |
![]() |
Show Abstract Operational Test (OT) is occasionally conducted after a system is already fielded. Unlike a traditional test based on Design of Experiments (DOE) principles, it is often not possible to vary the levels of the factors of interest. Instead the test is of an observational nature. Test planning for observational studies involves choosing where, when, and how long to evaluate a system in order to observe the possible combinations of factor levels that define the battlespace. This presentation discusses a test-planning method that uses Poisson process modeling as a way to estimate the length of time required to observe factor level combinations in the operational environment. |
|||
Model credibility in statistical reliability analysis with limited data | Caleb King | Wednesday, March 21 3:00 PM |
![]() |
Show Abstract Due to financial and production constraints, it has become increasingly common for analysts and test planners in defense applications to find themselves working with smaller amounts of data than seen in industry. These same analysts are also being asked to make strong statistical statements based on this limited data. For example, a common goal is ‘demonstrating’ a high reliability requirement with sparse data. In such situations, strong modeling assumptions are often used to achieve the desired precision. Such model-driven actions contain levels of risk that customers may not be aware of and may be too high to be considered acceptable. There is a need to articulate and mitigate risk associated with model form error in statistical reliability analysis. In this work, we review different views on model credibility from the statistical literature and discuss how these notions of credibility apply in data-limited settings. Specifically, we consider two different perspectives on model credibility: (1) data-driven credibility metrics for model fit, (2) credibility assessments based on consistency of analysis results with prior beliefs. We explain how these notions of credibility can be used to drive test planning and recommend an approach to presenting analysis results in data-limited settings. We apply this approach to two case studies from reliability analysis: Weibull analysis and Neyer-D optimal test plans. |
|||
Sound level recommendations for quiet sonic boom dose-response community surveys | Jasme Lee | Wednesday, March 21 3:00 PM |
![]() |
Show Abstract The current ban on commercial overland supersonic flight may be replaced by a noise limit on sonic boom sound level. NASA is establishing a quiet sonic boom database to guide the new regulation. The database will consist of multiple community surveys used to model the dose-response relationship between sonic boom sound levels and human annoyance. There are multiple candidate dose-response modeling techniques, such as classical logistic regression and multilevel modeling. To plan for these community surveys, recommendations for data collection will be developed from pilot community test data. Two important aspects are selecting sample size and sound level range. Selection of sample size must be strategic as large sample sizes are costly whereas small sample sizes may result in more uncertainty in the estimates. Likewise, there are trade-offs associated with selection of the sound level range. If the sound level range includes excessively high sound levels, the public may misunderstand the potential impact of quiet sonic booms, resulting in a negative backlash against a promising technological advancement. Conversely, a narrow range that includes only low sound levels might exclude the eventual noise limit. This presentation will focus on recommendations for sound level range given the expected shape of the dose-response curve. |
|||
Development of a Locking Setback Mass for Cluster Munition Applications: A UQ Case Study | Melissa Jablonski | Wednesday, March 21 3:00 PM |
![]() |
Show Abstract The Army is currently developing a cluster munition that is required to meet functional reliability requirements of 99%. This effort focuses on the design process for a setback lock within the safe and arm (S&A) device in the submunition fuze. This lock holds the arming rotor in place, thus preventing the fuze from beginning its arming sequence until the setback lock detracts during a launch event. Therefore, the setback lock is required to not arm (remain in place) during a drop event (safety) and to arm during a launch event (reliability). In order to meet these requirements, uncertainty quantification techniques were used to evaluate setback lock designs. We designed a simulation experiment, simulated the setback lock behavior in a drop event and in a launch event, fit a model to the results, and optimized the design for safety and reliability. Currently, 8 candidate designs that meet the requirements are being manufactured, and adaptive sensitivity testing is planned to inform the surrogate models and improve their predictive capability. A final optimized design will be chosen based on the improved models, and realistic drop safety and arm reliability predictions will be obtained using Monte-Carlo simulations of the surrogate models. |