Genevieve Gandara

Journal Entry For

Module 8 - Gen Des and ML

Link to Student

Gandara, Genevieve

Video- Google link

Module 8 Questions:

Part 1

Option A: Using Generative Design and Machine Learning for Faster Analysis Feedback (AU 2020)

What problem the speakers are solving/talking about, and the approach they take/propose:

Machine learning can be used to accelerate generative design. The standard practice in generative design uses physics based models to make predictions. In this report, the team at Autodesk trained a surrogate model as a replacement for a conventional physics based model. The surrogate model can predict the physics of the building faster than the deterministic physics-based model.

They begin by defining the problem as a set of inputs and outputs. Next, they generate synthetic building data by using a generative algorithm that automates the sampling of the input design variables. They use Insight simulation outputs as the training labels, treating simulated EUI as the ground-truth label for the surrogate model. Next, they choose the type of machine learning model. They train the model and validate the model through an iterative process. Lastly, they use the test set to report their results. They ran the model in Dynamo using a custom node with building parameter inputs and EUI as the output. In their demo, we see that there is very little lag in generating EUI compared to physics based models.

One thing in the talk that genuinely surprised you, or that you'd push back on:

Since it takes time to generate synthetic data and train a surrogate model, at what point is it worth developing the machine learning model? I assume it would only be worth the time to generate synthetic data and train a surrogate model if the model would be used many times. Otherwise, considering total time, it may be faster to use a deterministic model that has already been developed and validated.

What you'd want to try yourself or what would have to be true for you to try it in a project:

I would try this method on a project that requires many iterations of generative design and as a tool that would be used many times. I would want more guidance on what generative algorithm to use to generate the synthetic data. The group recommends 4 kinds: randomized solver, cross-product, like-this solver, and optimize solver. I would also make sure that the process is streamlined/automated as possible. I don’t want to have to manually input sets of data from one step to the next. For example, I would like the synthetic data simulator to be directly connected to Insight to get ground truth labels.

Part 2 One or two moments from the quarter where you felt the limits of the current tools What an AI- or ML- augmented version of that workflow might look like, even speculatively Whether you'd actually want that augmentation, or whether part of the friction was the point

Over the past 8 weeks, I have learned many new skills for parametric modeling. One of my biggest points of friction was the format of block coding. There were many times when I felt that implementing the logic would be much easier with a simple scripting tool, like Python. I initially used Grasshopper because there were more resources available for creating geometry in Grasshopper. However, at Module 5 I switched to using Dynamo and found the code blocks to be much easier to use than the Grasshopper ones. Eventually, I used the predefined nodes to generate the geometry, but relied heavily on Dynamo code blocks for the logic of my programs. For example, it became faster to implement boolean logic or long equations. An LLM could be used to develop the Dynamo code, instead of having to search through hundreds of nodes or write code blocks. The user could tell the LLM what logic they want to implement and the LLM could generate Dynamo code.

In modules 3 & 4, when I used Grasshopper to develop more complex geometries, I noticed that the geometries would take a while to generate after I flexed parameters. As discussed in Part 1, this could be a use case for a surrogate model, especially to accelerate the evaluation of generated forms rather than rerunning a slower analysis each time. For example, an ML model could be trained on previous design options and their performance metrics, then used to quickly predict daylight, insolation, or other outputs. However, the surrogate model would need to be trained on geometries that actually capture the complexity of the forms I created, and it would need to use the same input parameters. I would want this kind of augmentation for repetitive implementation, but I would not want AI to remove all of the friction. Part of the friction was useful because it forced me to understand how the geometry, parameters, and metrics were actually connected.

Part 3

Find three tools, companies, projects, research or open-source efforts working at the intersection of AI/ML and the built environment.

BESOS python library

Citation of conference paper: Christiaanse, T. V., Westermann, P., Beckett, W., Faure, G., & Evins, R. (2021). BESOS: A Python library that links EnergyPlus with energy hub, optimization and machine learning tools. In Proceedings of Building Simulation 2021: 17th Conference of IBPSA (pp. 1951–1958). International Building Performance Simulation Association. https://doi.org/10.26868/25222708.2021.30726

Documentation: https://besos.readthedocs.io/en/stable/

I first learned about BESOS during my undergraduate thesis in 2024. I was looking for tools to link EnergyPlus to an optimization scheme in Python, which is exactly what BESOS does. It allows users to import EnergyPlus files, weather files, and optimization algorithms from the Platypus library into one Python workflow. BESOS also supports surrogate modeling. In this workflow, the user first defines a set of building input parameters, samples many design options, and runs those options through EnergyPlus to generate simulation outputs such as EUI, heating demand, cooling demand, or cost. Then, BESOS can connect this dataset to machine-learning libraries like scikit-learn or TensorFlow to train a surrogate model that approximates the EnergyPlus results. The surrogate model does not replace EnergyPlus as the source of truth, but it can predict outputs much faster after learning from previous simulation runs. This would have changed my work this quarter because it could have made parametric evaluation and optimization faster, especially when I wanted to test many design alternatives and compare tradeoffs without waiting for a full simulation each time. It could have also allowed me to implement energy analysis into my evaluations.

Data 2 BEM

Paper Citation: Lu, J., Zheng, Z., Langtry, M., Jackson, M., Zhao, Y., Feng, C., Zhang, R., Zhang, C., Zhang, J., & Choudhary, R. (2025). Automated building energy modeling for energy retrofits using a large language model-based multi-agent framework. iScience, 28(11), 113867. https://doi.org/10.1016/j.isci.2025.113867

Data2BEM is a tool that uses an LLM-based multi-agent framework to automate the generation of building energy models. It was cited in a CEE 256 guest lecture as one of the strongest methods currently published for using machine learning to automate BEM, although humans still outperform it in some aspects. Data2BEM takes heterogeneous building information, such as architectural drawings, design specifications, and sensor data, and turns it into a calibrated building energy model. Instead of requiring a person to manually interpret drawings, enter envelope and HVAC parameters, set schedules, run calibration, and evaluate retrofit scenarios, the framework divides those tasks across multiple LLM agents.

This is interesting because building energy modeling is usually slow and requires a lot of expert judgment. Even when the modeling software already exists, there is still a lot of friction in translating real building information into a simulation-ready model. This would have changed how I worked this quarter because a lot of my parametric work still required me to manually define geometry, assign parameters, create logic, and interpret outputs. A tool like Data2BEM would not replace the need to understand the model, but it could have reduced the time spent on repetitive setup and let me focus more on evaluating design tradeoffs and understanding the results. It would also have allowed me to integrate energy analysis.

Per-FORM

Mokhtar, S., & Mueller, C. (2026). Implicit neural representations for surrogate modeling in the built environment. Automation in Construction, 182, 106744. https://doi.org/10.1016/j.autcon.2025.106744

This paper is about making slow building performance simulations faster using machine learning. Normally, if you want to know how wind moves around a building, you might run a CFD simulation. CFD can be very accurate, but it is slow and computationally expensive, especially if you are testing many building shapes during early design. The authors’ framework, called Per-FORM, tries to learn from many CFD simulations so that it can predict similar wind-flow results much faster for new building forms. The paper describes Per-FORM as an implicit-neural-representation framework for predictive modeling in the built environment, especially for complex geometry and continuous physical fields. In other words, it can predict a spatial field, meaning the pattern of wind speeds around the building. That is more useful for design because environmental performance is often spatial. This tool could have led to faster evaluations of environmental effects on the buildings, such as wind.

The paper’s data availability section says the code and instructions are available on GitHub.