Breakouts and Mini-tutorials for DATAWorks 2018 will be held on March 21st and 22nd (Wednesday and Thursday).
Title |
Speaker |
Schedule |
Materials |
Testing Autonomous Systems | Darryl Ahner | Wednesday, March 21 10:30 AM |
![]() |
Show Abstract Autonomous robotic systems (hereafter referred to simply as autonomous systems) have attracted interest in recent years as capabilities improve to operate in unstructured, dynamic environments without continuous human guidance. Acquisition of autonomous systems potentially decrease personnel costs and provide a capability to operate in dirty, dull, or dangerous mission segments or achieve greater operational performance. Autonomy enables a particular action of a system to be automatic or, within programmed boundaries, self-governing. For our purposes, autonomy is defined as the system having a set of intelligence-based capabilities (i.e., learned behaviors) that allows it to respond to situations that were not pre-programmed or anticipated (i.e. learning-based responses) prior to system deployment. Autonomous systems have a degree of self-governance and self-directed behavior, possibly with a human’s proxy for decisions. Because of these intelligence-based capabilities, autonomous systems pose new challenges in conducting test and evaluation that assures adequate performance, safety, and cybersecurity outcomes. We propose an autonomous systems architecture concept and map the elements of a decision theoretic view of a generic decision problem to the components of this architecture. These models offer a foundation for developing a decision-based, common framework for autonomous systems. We also identify some of the various challenges faced by the Department of Defense (DoD) test and evaluation community in assuring the behavior of autonomous systems as well as test and evaluation requirements, processes, and methods needed to address these challenges. |
|||
Screening Experiments with Partial Replication | David Edwards |
Wednesday, March 21 10:30 AM |
![]() |
Show Abstract Small screening designs are frequently used in the initial stages of experimentation with the goal of identifying important main effects as well as to gain insight on potentially important two-factor interactions. Commonly utilized experimental designs for screening (e.g., resolution III or IV two-level fractional factorials, Plackett-Burman designs, etc.) are unreplicated and as such, provide no unbiased estimate of experimental error. However, if statistical inference is considered an integral part of the experimental analysis, one view is that inferential procedures should be performed using the unbiased pure error estimate. As full replication of an experiment may be quite costly, partial replication offers an alternative for obtaining a model independent error estimate. Gilmour and Trinca (2012, Applied Statistics) introduce criteria for the design of optimal experiments for statistical inference (providing for the optimal selection of replicated design points). We begin with an extension of their work by proposing a Bayesian criterion for the construction of partially replicated screening designs with less dependence on an assumed model. We then consider the use of the proposed criterion within the context of multi-criteria design selection where estimation and protection against model misspecification are considered. Insights for analysis and model selection in light of partial replication will be provided. |
|||
What is Bayesian Experimental Design? | Blaza Toman |
Wednesday, March 21 10:30 AM |
![]() |
Show Abstract In an experiment with a single factor with three levels, treatments A, B, and C, a single treatment is to be applied to each of several experimental units selected from some set of units. The response variable is continuous, and differences in its value show the relative effectiveness of the treatments. An experimental design will dictate which treatment is applied to what units. Since differences in the response variable are used to judge differences between treatments, the most important goal of the design is to prevent the treatment effect being masked by some unrelated property of the experimental units. Second important function of the design is to ensure power, that is, that if the treatments are not equally effective, the differences in the response variable are likely to be larger than background noise. Classical experimental design theory uses three principles: replication, randomization, and blocking, to produce an experimental design. Replication refers to how many units are used, blocking is a possible grouping of the units to reduce between unit heterogeneity, and randomization governs the assignment of units to treatment. Classical experimental designs are balanced as much as possible, that is, the three treatments are applied the same number of times, in each potential block of units. Bayesian experimental design aims to make use of additional related information, often called prior information, to produce a design. The information may be in the form of related experimental results, for example, treatments A and B may have been previously studied. It could be additional information about the experimental units, or about the response variable. This additional information could be used to change the usual blocking, to reduce the number of units assigned to treatments A and B compared to C, and/or reduce the total number of units needed to ensure power. This talk aims to explain Bayesian design concepts and illustrate them on realistic examples. |
|||
Test Science Synergies for Complex Systems Test Events: Designed Experiments and Automated Software Test | Jim Simpson | Wednesday, March 21 10:30 AM |
![]() |
Show Abstract Rigorous, efficient and effective test science techniques are individually taking hold in many software centric DoD acquisition programs, both in developmental and operational test regimes. These techniques include agile software development, cybersecurity test and evaluation (T&E), design and analysis of experiments and automated software testing. Many software centric programs must also be tested together with other systems to demonstrate they can be successfully integrated into a more complex systems of systems. This presentation focuses on the two test science disciplines of designed experiments (DOE) and automated software testing (AST) and describes how they can be used effectively and leverage one another in planning for and executing a system of systems test strategy. We use the Navy’s Distributed Common Ground System as an example. |
|||
Building a Universal Helicopter Noise Model Using Machine Learning |
Eric Greenwood |
Wednesday, March 21 10:30 AM |
![]() |
Show Abstract Helicopters serve a number of useful roles within the community; however, community acceptance of helicopter operations is often limited by the resulting noise. Because the noise characteristics of helicopters depend strongly on the operating condition of the vehicle, effective noise abatement procedures can be developed for a particular helicopter type, but only when the noisy regions of the operating envelope are identified. NASA Langley Research Center—often in collaboration with other US Government agencies, industry, and academia—has conducted noise measurements for a wide variety of helicopter types, from light commercial helicopters to heavy military utility helicopters. While this database is expansive, it covers only a fraction of helicopter types in current commercial and military service and was measured under a limited set of ambient conditions and vehicle configurations. This talk will describe a new “universal” helicopter noise model suitable for planning helicopter noise abatement procedures. Modern machine learning techniques will be combined with the principle of nondimensionalization and applied to NASA’s helicopter noise data in order to develop a model capable of estimating the noisy operating states of any conventional helicopter under any specific ambient conditions and vehicle configurations. |
|||
Using Ensemble Methods and Physiological Sensor Data to Predict Pilot Cognitive State | Tina Heinich |
Wednesday, March 21 10:30 AM |
![]() |
Show Abstract The goal of the Crew State Monitoring (CSM) project is to use machine learning models trained with physiological data to predict unsafe cognitive states in pilots such as Channelized Attention (CA) and Startle/Surprise (SS). These models will be used in a real-time system that predicts a pilot’s mental state every second, a tool that can be used to help pilots recognize and recover from these mental states. Pilots wore different sensors that collected physiological data such as a 20-channel electroencephalography (EEG), respiration, and galvanic skin response (GSR). Pilots performed non-flight benchmark tasks designed to induce these states, and a flight simulation with “surprising” or “channelizing” events. The team created a pipeline to generate pilot-dependent models that trains on benchmark data, tune on a portion of a flight task, and be deployed onto the remaining flight task. The model is a series of anomaly-detection based ensembles, where each ensemble focuses on predicting a single state. Ensembles were comprised of several anomaly detectors such as One Class SVMs, each focusing on a different subset of sensor data. We will discuss the performance of these models, as well as the ongoing research generalizing models across pilots and improving accuracy. |
|||
CYBER Penetration Testing and Statistical Analysis in DT&E | Timothy McLean |
Wednesday, March 21 1:15 PM |
![]() |
Show Abstract Reconnaissance, footprinting, and enumeration are critical steps in the CYBER penetration testing process because if these steps are not fully and extensively executed, the information available for finding a system’s vulnerabilities may be limited. During the CYBER testing process, penetration testers often find themselves doing the same initial enumeration scans over and over for each system under test. Because of this, automated scripts have been developed that take these mundane and repetitive manual steps and perform them automatically with little user input. Once automation is present in the penetration testing process, Scientific Test and Analysis Techniques (STAT) can be incorporated. By combining automation and STAT in the CYBER penetration testing process, Mr. Tim McLean at Marine Corps Tactical Systems Support Activity (MCTSSA) coined a new term called CYBERSTAT. CYBERSTAT is applying scientific test and analysis techniques to offensive CYBER penetration testing tools with an important realization that CYBERSTAT assumes the system under test is the offensive penetration test tool itself. By applying combinatorial testing techniques to the CYBER tool, the CYBER tool’s scope is expanded beyond “one at a time” uses as the combinations of the CYBER tool’s capabilities and options are explored and executed as test cases against the target system. In CYBERSTAT, the additional test cases produced by STAT can be run automatically using scripts. This talk will show how MCTSSA is preparing to use CYBERSTAT in the Developmental Test and Evaluation process of USMC Command and Control systems. |
|||
Method for Evaluating the Quality of Cybersecurity Defenses | Shawn Whetstone | Wednesday, March 21 1:15 PM |
![]() |
Show Abstract This presentation discusses a methodology to use knowledge of cyber attacks and defender responses from operational assessments to gain insights into the defensive posture and to inform a strategy for improvement. The concept is to use the attack thread as the instrument to probe and measure the detection capability of the cyber defenses. The data enable a logistic regression approach to provide a quantitative basis for the analysis and recommendations. |
|||
Metrics to Characterize Temporal Patterns in Lifespans of Artifacts | Soumyo Moitra | Wednesday, March 21 1:15 PM |
![]() |
Show Abstract This paper discusses some metrics that are useful to model the lifespans of artifacts observed over time that do not follow a traditional lifecycle. Some examples these stochastic point processes are presented and some metrics are developed that help to characterize their patterns and allow the lifespans to be clustered. Results from simulated data are presented and the paper discusses how the metrics can be applied and interpreted. |
|||
Uncertainty Quantification with Mixed Uncertainty Sources | Tom West |
Wednesday, March 21 1:15 PM |
Materials TBD![]() |
Show Abstract Over the past decade, uncertainty quantification has become an integral part of engineering design and analysis. Both NASA and the DoD are making significant investments to advance the science of uncertainty quantification, increase the knowledge base, and strategically expanding its use. This increased use of uncertainty based results improves investment strategies and decision making. However, in complex systems, many challenges still exist when dealing with uncertainty in cases that have sparse, unreliable, poorly understood, and/or conflicting data. Often times, assumptions are made regarding the statistical nature of data that may not be well grounded and the impact of those assumptions is not well understood, which can lead to ill-informed decision making. This talk will focus on the quantification of uncertainty when both well characterized, aleatory, and not well known, epistemic, uncertainty sources exist. Particular focus is given to the treatment and management of epistemic uncertainty. A summary of non-probabilistic methods will be presented along with the propagation of mixed uncertainty and optimization under uncertainty. A discussion of decision making under uncertainty is also included to illustrate the use of uncertainty quantification. |
|||
System Level Uncertainty Quantification for Low-Boom Supersonic Flight Vehicles | Ben Phillips |
Wednesday, March 21 1:15 PM |
Materials TBD![]() |
Show Abstract Under current FAA regulations, civilian aircraft may not operate at supersonic speeds over land. However, over the past few decades, there have been renewed efforts to invest in technologies to mitigate sonic boom from supersonic aircraft through advances in both vehicle design and sonic boom prediction. NASA has heavily invested in tools and technologies to enable commercial supersonic flight and currently has several technical challenges related to sonic boom reduction. One specific technical challenge relates to the development of tools and methods to predict, under uncertainty, the noise on the ground generated by an aircraft flying at supersonic speeds. In attempting to predict ground noise, many factors from multiple disciplines must be considered. Further, classification and treatment of uncertainties in coupled systems, mutlifidelity simulations, experimental data, and community responses are all concerns in system level analysis of sonic boom prediction. This presentation will introduce the various methodologies and techniques utilized for uncertainty quantification with a focus on the build up to system level analysis. An overview of recent research activities and case studies investigating the impact of various disciplines and factors on variance in ground noise will be discussed. |
|||
Uncertainty Quantification and Analysis at The Boeing Company | John Schaefer | Wednesday, March 21 1:15 PM |
![]() |
Show Abstract The Boeing Company is assessing uncertainty quantification methodologies across many phases of aircraft design in order to establish confidence in computational fluid dynamics-based simulations of aircraft performance. This presentation provides an overview of several of these efforts. First, the uncertainty in aerodynamic performance metrics of a commercial aircraft at transonic cruise due to turbulence model and flight condition variability is assessed using 3D CFD with non-intrusive polynomial chaos and second order probability. Second, a sample computation of uncertainty in increments is performed for an engineering trade study, leading to the development of a new method for propagating input-uncontrolled uncertainties as well as input-controlled uncertainties. This type of consideration is necessary to account for variability associated with grid convergence on different configurations, for example. Finally, progress toward applying the computed uncertainties in forces and moments into an aerodynamic database used for flight simulation will be discussed. This approach uses a combination of Gaussian processes and multiple-fidelity Kriging meta-modeling to synthesize the required data. |
|||
Interface design for analysts in a data and analysis-rich environment | Karin Butler |
Wednesday, March 21 3:00 PM |
![]() |
Show Abstract Increasingly humans will rely on the outputs of our computational partners to make sense of the complex systems in our world. To be employed, the statistical and algorithmic analysis tools that are deployed to analysts’ toolboxes must afford their proper use and interpretation. Interface design for these tool users should provide decision support appropriate for the current stage of sensemaking. Understanding how users build, test, and elaborate their mental models of complex systems can guide the development of robust interfaces. |
|||
Asparagus is the most articulate vegetable ever | Kerstan Cole |
Wednesday, March 21 3:00 PM |
Materials TBD![]() |
Show Abstract During the summer of 2001, Microsoft launched Windows XP, which was lauded by many users as the most reliable and usable operating system at the time. Miami Herald columnist, Dave Berry, responded to this praise by stating that “this is like saying asparagus is the most articulate vegetable ever.” Whether you agree or disagree with Dave Berry, these users’ reactions are relative (to other operating systems and to other past and future versions). This is due to an array of technological factors that have facilitated human-system improvements. Automation is often cited as improving human-system performance across many domains. It is true that when the human and automation are aligned, performance improves. But, what about the times that this is not the case? This presentation will describe the myths and facts about human-system performance and increasing levels of automation through examples of human-system R&D conducted on a satellite ground system. Factors that affect human-system performance and a method to characterize mission performance as it relates to increasing levels of automation will also be discussed. |
|||
Mitigating Pilot Disorientation with Synthetic Vision Displays | Kathryn Ballard |
Wednesday, March 21 3:00 PM |
![]() |
Show Abstract Loss of control in flight has been a leading cause of accidents and incidents in commercial aviation worldwide. The Commercial Aviation Safety Team (CAST) requested studies on virtual day-visual meteorological conditions displays, such as synthetic vision, in order to combat loss of control. Over the last four years NASA has conducted a series of experiments evaluating the efficacy of synthetic vision displays for increased spatial awareness. Commercial pilots with various levels of experience from both domestic and international airlines were used as subjects. This presentation describes the synthetic vision research and how pilot subjects affected experiment design and statistical analyses. |
|||
Single Event Upset Prediction for the Stratospheric Aerosol and Gas Experiment Autonomous Instrument | Ray McCollum |
Wednesday, March 21 3:00 PM |
![]() |
Show Abstract The Stratospheric Aerosol and Gas Experiment (SAGE III) aboard the International Space Station (ISS) was experiencing a series of anomalies called Single Event Upsets (SEUs). Booz Allen Hamilton was tasked with conducting a statistical analysis to model the incidence of SEUs in the SAGE III equipment aboard the ISS. The team identified factors correlated with SEU incidences, set up a model to track degradation of Sage III, and showed current and past probabilities as a function of the space environment. The space environment of SAGE III was studied to identify possible causes of SEUs. The analysis revealed variables most correlated with the anomalies, including solar wind strength, solar and geomagnetic field behavior, and location/orientation of the ISS, sun, and moon. The data was gathered from a variety of sources including US government agencies, foreign and domestic academic centers, and state-of-the-art simulation algorithms and programs. Logistic regression was used to analyze SEUs and gain preliminary results. The data was divided into small time intervals to approximate independence and allow logistic regression. Due to the rarity of events the initial model results were based on few SEUs. The team set up a Graphical User Interface (GUI) program to automatically analyze new data as it became available to the SAGE III team. A GUI was built to allow the addition of more data over the life of the SAGE III mission. As more SEU incidents occur and are entered into the model, its predictive power will grow significantly. The GUI enables the user to easily rerun the regression analysis and visualize its results to inform operational decision making. |
|||
Development of an Instrument to Measure Trust of Automated Systems | Heather Wojton | Wednesday, March 21 3:00 PM |
![]() |
Show Abstract Automated systems are technologies that actively select data, transform information, make decisions, and control processes. The U.S. military uses automated systems to perform search and rescue and reconnaissance missions, and to assume control of aircraft to avoid ground collision. Facilitating appropriate trust in automated systems is essential to improving the safety and performance of human-system interactions. In two studies, we developed and validated an instrument to measure trust in automated systems. In study 1, we demonstrated that the scale has a 2-factor structure and demonstrates concurrent validity. We replicated these results using an independent sample in study 2. |
|||
Operational Evaluation of a Flight-deck Software Application | Sara Wilson |
Wednesday, March 21 3:00 PM |
![]() |
Show Abstract Traffic Aware Strategic Aircrew Requests (TASAR) is a NASA-developed operational concept for flight efficiency and route optimization for the near-term airline flight deck. TASAR provides the aircrew with a cockpit automation tool that leverages a growing number of information sources on the flight deck to make fuel- and time-saving route optimization recommendations while in route. In partnership with a commercial airline, a research prototype software that implements TASAR has been installed on three aircraft to enable the evaluation of this software in operational use. During the flight trials, data are being collected to quantify operational performance, which will enable NASA to improve algorithms and enhance functionality in the software based on real-world user experience. This presentation highlights statistical challenges and discusses lessons learned during the initial stages of the operational evaluation. |
|||
Comparing M&S Output to Live Test Data: A Missile System Case Study | Kelly Avery |
Wednesday, March 21 4:45 PM |
![]() |
Show Abstract In the operational testing of DoD weapons systems, modeling and simulation (M&S) is often used to supplement live test data in order to support a more complete and rigorous evaluation. Before the output of the M&S is included in reports to decision makers, it must first be thoroughly verified and validated to show that it adequately represents the real world for the purposes of the intended use. Part of the validation process should include a statistical comparison of live data to M&S output. This presentation includes an example of one such validation analysis for a tactical missile system. In this case, the goal is to validate a lethality model that predicts the likelihood of destroying a particular enemy target. Using design of experiments, along with basic analysis techniques such as the Kolmogorov-Smirnov test and Poisson regression, we can explore differences between the M&S and live data across multiple operational conditions and quantify the associated uncertainties. |
|||
NASA’s Human Exploration Research Analog (HERA): An analog mission for isolation, confinement, and remote conditions in space exploration scenarios | Shelley Cazares | Wednesday, March 21 4:45 PM |
Materials TBD![]() |
Show Abstract Shelley Cazares served as a crewmember of the 14th mission of NASA’s Human Exploration Research Analog (HERA). In August 2017, Dr. Cazares and her three crewmates were enclosed in an approximately 600-sq. ft. simulated spacecraft for an anticipated 45 days of confined isolation at Johnson Space Center, Houston, TX. In preparation for long-duration missions to Mars in the 2030s and beyond, NASA seeks to understand what types of diets, habitats, and activities can keep astronauts healthy and happy on deep space voyages. To collect this information, NASA is conducting several analog missions simulating the conditions astronauts face in space. HERA is a set of experiments to investigate the effects of isolation, confinement, and remote conditions in space exploration scenarios. Dr. Cazares will discuss the application procedure, the pre-mission training process, the life and times inside the habitat during the mission, and her crew’s emergency evacuation from the habitat due to the risk of rising floodwaters in Hurricane Harvey. |
|||
Lessons Learned in Reliability | Dan Telford |
Wednesday, March 21 4:45 PM |
![]() |
Show Abstract Although reliability analysis is a part of Operational Test and Evaluation, it is uncommon for analysts to have a background in reliability theory or experience applying it. This presentation highlights some lessons learned from reliability analysis conducted on several AFOTEC test programs. Topics include issues related to censored data, limitations and alternatives to using the exponential distribution, and failure rate analysis using test data. |
|||
Design and Analysis of Nonlinear Models for the Mars 2020 Rover | Jim Wisnowski |
Wednesday, March 21 4:45 PM |
![]() |
Show Abstract The Mars Rover 2020 team commonly faces nonlinear behavior across the test program that is often closely related to the underlying physics. Classical and newer response surface designs do well with quadratic approximations while space filling designs have proven useful for modeling & simulation of complex surfaces. This talk specifically covers fitting nonlinear equations based on engineering functional forms as well as sigmoid and exponential decay curves. We demonstrate best practices on how to design and augment nonlinear designs using the Bayesian D-Optimal Criteria. Several examples, to include drill bit degradation, illustrate the relative ease of implementation with popular software and the utility of these methods. |
|||
Classes of Second-Order Split-Plot Designs | Luis Cortes | Thursday, March 22 10:00 AM |
![]() |
Show Abstract The fundamental principles of experiment design are factorization, replication, randomization, and local control of error. In many industries, however, departure from these principles is commonplace. Often in our experiments complete randomization is not feasible because the factor level settings are hard, impractical, or inconvenient to change or the resources available to execute under homogeneous conditions are limited. These restrictions in randomization lead to split-plot experiments. We are also often interested in fitting second-order models leading to second-order split-plot experiments. Although response surface methodology has grown tremendously since 1951, the lack of alternatives for second-order split-plots remains largely unexplored. The literature and textbooks offer limited examples and provide guidelines that often are too general. This deficit of information leaves practitioners ill prepared to face the many roadblocks associated with these types of designs. This presentation provides practical strategies to help practitioners in dealing with second-order split-plot and by extension, split-split-plot experiments, including an innovative approach for the construction of a response surface design referred to as second-order sub-array Cartesian product split-plot design. This new type of design, which is an alternative to other classes of split-plot designs that are currently in use in defense and industrial applications, is economical, has a low prediction variance of the regression coefficients, and low aliasing between model terms. Based on an assessment using well accepted key design evaluation criterion, second-order sub-array Cartesian product split-plot designs perform as well as historical designs that have been considered standards up to this point. |
|||
The Development and Execution of Split-Plot Designs In Navy Operational Test and Evaluation: A Practitioner’s Perspective | Stargel Doane |
Thursday, March 22 10:00 AM |
Materials TBD![]() |
Show Abstract Randomization is one of the basic principles of experimental design and the associated statistical methods. In Navy operational testing, complete randomization is often not possible due to scheduling or execution constraints. Given these constraints, operational test designers often utilize split-plot designs to accommodate the hard-to-change nature of various factors of interest. Several case studies will be presented to provide insight into the challenges associated with Navy operational test design and execution. |
|||
B-52 Radar Modernization Test Design Considerations | Stuart Corbett & Joseph Maloney | Thursday, March 22 10:00 AM |
![]() |
Show Abstract Inherent system processes, restrictions on collection, or cost may impact the practical execution of an operational test. This study presents the use of blocking and split-plot designs when complete randomization is not feasible in operational test. Specifically, the USAF B-52 Radar Modernization Program test design is used to present tradeoffs of different design choices and the impacts of those choices on cost, operational relevance, and analytical rigor. |
|||
Experimental Design of a Unique Force Measurement System Calibration | Ken Toro |
Thursday, March 22 10:00 AM |
Materials TBD![]() |
Show Abstract Aerodynamic databases for space flight vehicles rely on wind-tunnel tests utilizing precision force measurement systems (FMS). Recently, NASA’s Space Launch System (SLS) program has conducted numerous wind-tunnel testing. This presentation will focus on the calibration of a unique booster FMS through the use of design of experiments (DoE) and regression modeling. Utilization of DoE resulted in a sparse, time-efficient, design with results exceeding researcher’s expectations. |
|||
Application of Design of Experiments to a Calibration of the National Transonic Facility | Matt Rhode & Matt Bailey |
Thursday, March 22 10:00 AM |
![]() |
Show Abstract Recent work at the National Transonic Facility (NTF) at the NASA Langley Research Center has shown that a substantial reduction in freestream pressure fluctuations can be achieved by positioning the moveable model support walls and plenum re-entry flaps to choke the flow just downstream of the test section. This choked condition reduces the upstream propagation of disturbances from the diffuser into the test section, resulting in improved Mach number control and reduced freestream variability. The choked conditions also affect the Mach number gradient and distribution in the test section, so a calibration experiment was undertaken to quantify the effects of the model support wall and re-entry flap movements on the facility freestream flow using a centerline static pipe. A design of experiments (DOE) approach was used to develop restricted-randomization experiments to determine the effects of total pressure, reference Mach number, model support wall angle, re-entry flap gap height, and test section longitudinal location on the centerline static pressure and local Mach number distributions for a reference Mach number range from 0.7 to 0.9. Tests were conducted using air as the test medium at a total temperature of 120 °F as well as for gaseous nitrogen at cryogenic total temperatures of -50, -150, and -250 °F. The resulting data were used to construct quadratic polynomial regression models for these factors using a Restricted Maximum Likelihood (REML) estimator approach. Independent validation data were acquired at off-design conditions to check the accuracy of the regression models. Additional experiments were designed and executed over the full Mach number range of the facility (0.2 £ Mref £ 1.1) at each of the four total temperature conditions, but with the model support walls and re-entry flaps set to their nominal positions, in order to provide calibration regression models for operational experiments where a choked condition downstream of the test section is either not feasible or not required. This presentation focuses on the design, execution, analysis, and results for the two experiments performed using air at a total temperature of 120 °F. Comparisons are made between the regression model output and validation data, as well as the legacy NTF calibration results, and future work is discussed. |
|||
The Use of DOE vs OFAT in the Calibration of AEDC Wind Tunnels | Rebecca Rought | Thursday, March 22 10:00 AM |
![]() |
Show Abstract The use of statistically rigorous methods to support testing at Arnold Engineering Development Complex (AEDC) has been an area of focus in recent years. As part of this effort, the use of Design of Experiments (DOE) has been introduced for calibration of AEDC wind tunnels. Historical calibration efforts used One- Factor-at-a-Time (OFAT) test matrices, with a concentration on conditions of interest to test customers. With the introduction of DOE, the number of test points collected during the calibration decreased, and were not necessary located at historical calibration points. To validate the use of DOE for calibration purposes, the 4-ft Aerodynamic Wind Tunnel 4T was calibrated using both DOE and OFAT methods. The results from the OFAT calibration were compared to model developed from the DOE data points and it was determined that the DOE model sufficiently captured the tunnel behavior within the desired levels of uncertainty. DOE analysis also showed that within Tunnel 4T, systematic errors are insignificant as indicated by agreement noted between the two methods. Based on the results of this calibration, a decision was made to apply DOE methods to future tunnel calibrations, as appropriate. The development of the DOE matrix in Tunnel 4T required the consideration of operational limitations, measurement uncertainties, and differing tunnel behavior over the performance map. Traditional OFAT methods allowed tunnel operators to set conditions efficiently while minimizing time consuming plant configuration changes. DOE methods, however, require the use of randomization which had the potential to add significant operation time to the calibration. Additionally, certain tunnel parameters, such as variable porosity, are only of interest in a specific region of the performance map. In addition to operational concerns, measurement uncertainty was an important consideration for the DOE matrix. At low tunnel total pressures, the uncertainty in the Mach number measurements increase significantly. Aside from introducing non-constant variance into the calibration model, the large uncertainties at low pressures can increase overall uncertainty in the calibration in high pressure regions where the uncertainty would otherwise be lower. At high pressures and transonic Mach numbers, low Mach number uncertainties are required to meet drag count uncertainty requirements. To satisfy both the operational and calibration requirements, the DOE matrix was divided into multiple independent models over the tunnel performance map. Following the Tunnel 4T calibration, AEDC calibrated the Propulsion Wind Tunnel 16T, Hypersonic Wind Tunnels B and C, and the National Full-Scale Aerodynamics Complex (NFAC). DOE techniques were successfully applied to the calibration of Tunnel B and NFAC, while a combination of DOE and OFAT test methods were used in Tunnel 16T because of operational and uncertainty requirements over a portion of the performance map. Tunnel C was calibrated using OFAT because of operational constraints. The cost of calibrating these tunnels has not been significantly reduced through the use of DOE, but the characterization of test condition uncertainties is firmly based in statistical methods. |
|||
Initial Investigation into the Psychoacoustic Properties of Small Unmanned Aerial System Noise |
Andrew Christian |
Thursday, March 22 1:15 PM |
![]() |
Show Abstract For the past several years, researchers at NASA Langley have been engaged in a series of projects to study the degree to which existing facilities and capabilities, originally created for work on full-scale aircraft, are extensible to smaller scales – those of the small unmanned aerial systems (sUAS, also UAVs and, colloquially, `drones’) that have been showing up in the nation’s airspace. This paper follows an effort that has led to an initial human-subject psychoacoustic test regarding the annoyance generated by sUAS noise. This effort spans three phases: 1. the collection of the sounds through field recordings, 2. the formulation and execution of a psychoacoustic test using those recordings, 3. the analysis of the data from that test. The data suggests a lack of parity between the noise of the recorded sUAS and that of a set of road vehicles that were also recorded and included in the test, as measured by a set of contemporary noise metrics. |
|||
Combining Human Factors Data and Models of Human Performance | Cynthia Null |
Thursday, March 22 1:15 PM |
![]() |
Show Abstract As systems and missions become increasingly complex, the roles of humans throughout the mission life cycle is evolving. In areas, such as maintenance and repair, hands-on tasks still dominate, however, new technologies have changed many tasks. For example, some critical human tasks have moved from manual control to supervisory control, often of systems at great distances (e.g., remotely piloting a vehicle, or science data collection on Mars). While achieving mission success remains the key human goal, almost all human performance metrics focus on failures rather than successes. This talk will examine the role of humans in creating mission success as well as new approaches for system validation testing needed to keep up with evolving systems and human roles. |
|||
A Multi-method, Triangulation Approach to Operational Testing | Daniel Porter |
Thursday, March 22 1:15 PM |
![]() |
Show Abstract Humans are not produced in quality-controlled assembly lines, and we typically are much more variable than the mechanical systems we employ. This mismatch means that when characterizing the effectiveness of a system, the system must be considered in the context of its users. Accurate measurement is critical to this endeavor, yet while human variability is large, effort to reduce measurement error of those humans is relatively small. The following talk discusses the importance of using multiple measurement methods—triangulation—to reduce error and increase confidence when characterizing the quality of HSI. A case study from an operational test of an attack helicopter demonstrates how triangulation enables more actionable recommendations. |
|||
Illustrating the Importance of Uncertainty Quantification (UQ) in Munitions Modeling | Donald Carlucci |
Thursday, March 22 1:15 PM |
Materials TBD![]() |
Show Abstract The importance of the incorporation of Uncertainty Quantification (UQ) techniques into the design and analysis of Army systems is discussed. Relevant examples are presented where UQ would have been extremely useful. The intent of the presentation is to show the broad relevance of UQ and how, in the future, it will greatly improve the time to fielding and quality of developmental items. |
|||
Space-Filling Designs for Robustness Experiments | Roshan Vengazhiyil | Thursday, March 22 1:15 PM |
Materials TBD![]() |
Show Abstract To identify the robust settings of the control factors, it is very important to understand how they interact with the noise factors. In this article, we propose space-filling designs for computer experiments that are more capable of accurately estimating the control-by-noise interactions. Moreover, the existing space-filling designs focus on uniformly distributing the points in the design space, which are not suitable for noise factors because they usually follow non-uniform distributions such as normal distribution. This would suggest placing more points in the regions with high probability mass. However, noise factors also tend to have a smooth relationship with the response and therefore, placing more points towards the tails of the distribution is also useful for accurately estimating the relationship. These two opposing effects make the experimental design methodology a challenging problem. We propose optimal and computationally efficient solutions to this problem and demonstrate their advantages using simulated examples and a real industry example involving a manufacturing packing line. |
|||
Introduction of Uncertainty Quantification and Industry Challenges | Peter Chien |
Thursday, March 22 1:15 PM |
Materials TBD![]() |
Show Abstract Uncertainty is an inescapable reality that can be found in nearly all types of engineering analyses. It arises from sources like measurement inaccuracies, material properties, boundary and initial conditions, and modeling approximations. For example, the increasing use of numerical simulation models throughout industry promises improved design and insight at significantly lower costs and shorter timeframes than purely physical testing. However, the addition of numerical modeling has also introduced complexity and uncertainty to the process of generating actionable results. It has become not only possible, but vital to include Uncertainty Quantification (UQ) in engineering analysis. The competitive benefits of UQ include reduced development time and cost, improved designs, better understanding of risk, and quantifiable confidence in analysis results and engineering decisions. Unfortunately, there are significant cultural and technical challenges which prevent organizations from utilizing UQ methods and techniques in their engineering practice. This presentation will introduce UQ methodology and discuss the past and present strategies for addressing these challenges, making it possible to use UQ to enhance engineering processes with fewer resources and in more situations. Looking to the future, anticipated challenges will be discussed along with an outline of the path towards making UQ a common practice in engineering. |
|||
The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size | Kristina Carter |
Thursday, March 22 3:00 PM |
![]() |
Show Abstract Mixed models are ideally suited to analyzing nested data from within-persons designs, designs that are advantageous in applied research. Mixed models have the advantage of enabling modeling of random effects, facilitating an accounting of the intra-person variation captured by multiple observations of the same participants and suggesting further lines of control to the researcher. However, the sampling requirements for mixed models are prohibitive to other areas which could greatly benefit from them. This simulation study examines the impact of small sample sizes in both levels of the model on the fixed effect bias, type I error, and power of a simple mixed model analysis. Despite the need for adjustments to control for type I error inflation, findings indicate that smaller samples than previously recognized can be used for mixed models under certain conditions prevalent in applied research. Examination of the marginal benefit of increases in sample subject and observation size provides applied researchers with guidance for developing mixed-model repeated measures designs that maximize power. |
|||
Evaluating Deterministic Models of Time Series by Comparison to Observations | Amy Braverman | Thursday, March 22 3:00 PM |
![]() |
Show Abstract A standard paradigm for assessing the quality of model simulations is to compare what these models produce to experimental or observational samples of what the models seek to predict. Often these comparisons are based on simple summary statistics, even when the objects of interest are time series. Here, we propose a method of evaluation through probabilities derived from tests of hypotheses that model-simulated and observed time sequences share common signals. The probabilities are based on the behavior of summary statistics of model output and observational data, over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise. We demonstrate with an example from climate model evaluation for which this methodology was developed. |
|||
Challenger Challenge: Pass-Fail Thinking Increases Risk Measurably | Kenneth Johnson |
Thursday, March 22 3:00 PM |
![]() |
Show Abstract Binomial (pass-fail) response metrics are more far more commonly used in test, requirements, quality and engineering than they need to be. In fact, there is even an engineering school of thought that they’re superior to continuous-variable metrics. This is a serious, even dangerous problem in aerospace and other industries: think the Space Shuttle Challenger accident. There are better ways. This talk will cover some examples of methods available to engineers and statisticians in common statistical software. It will not dig far into the mathematics of the methods, but will walk through where each method might be most useful and some of the pitfalls inherent in their use – including potential sources of misinterpretation and suspicion by your teammates and customers. The talk is geared toward engineers, managers and professionals in the –ilities who run into frustrations dealing with pass-fail data and thinking. |
|||
Doppler Assisted Sensor Fusion for Tracking and Exploitation | J. Derek Tucker |
Thursday, March 22 3:00 PM |
![]() |
Show Abstract We have developed a new sensor fusion approach called Doppler Assisted Sensor Fusion (DASF), which pairs a range rate profile from one moving sensor with location accuracy with another range rate profile from another sensor with high location accuracy. This paring provides accurate identification, location, and tracking of moving emitters, with low association latency. The approach we use for data fusion is distinct from previous approaches. In the conventional approach, post detection data from the each sensor is overlaid with data from another sensor in an attempt to associate the data outputs. For the DASF approach the fusion is at the sensor level, the first sensor collects data and provides the standard identification in addition a unique emitter range rate profile. This profile is used to associate the emitter signature to a range-rate signature obtained by the geolocation sensor. The geolocation sensor then provides the desired location accuracy. We will provide results using real tracking data scenarios. |
|||
XPCA: A Copula-based Generalization of PCA for Ordinal Data | Cliff Anderson-Bergman |
Thursday, March 22 3:00 PM |
![]() |
Show Abstract Principal Component Analysis is a standard tool in an analyst’s toolbox. The standard practice of rescaling each column can be reframed as a copula-based decomposition in which the marginal distributions are fit with a univariate Gaussian distribution and the joint distribution is modeled with a Gaussian copula. In this light, we present an alternative to traditional PCA we call XPCA by relaxing the marginal Gaussian assumption and instead fit each marginal distribution with the empirical distribution function. Interval-censoring methods are used to account for the discrete nature of the empirical distribution function when fitting the Gaussian copula model. In this talk, we derive the XPCA estimator and inspect the differences in fits on both simulated and real data applications. |
|||
Insights, Predictions, and Actions: Descriptive Definitions of Data Science, Machine Learning, and Artificial Intelligence | Andrew Flack |
Thursday, March 22 3:00 PM |
![]() |
Show Abstract The terms “Data Science”, “Machine Learning”, and “Artificial Intelligence” have become increasingly common in popular media, professional publications, and even in the language used by DoD leadership. But these terms are often not well understood, and may be used incorrectly and interchangeably. Even a textbook definition of these fields is unlikely to help with the distinction, as many definitions tend to lump everything under the umbrella of computer science or introduce unnecessary buzzwords. Leveraging a framework first proposed by David Robinson, Chief Data Scientist at DataCamp, we forgo the textbook definitions and instead focus on practical distinctions between the work of practitioners in each field, using examples relevant to the test and evaluation community where applicable. |