Share this article:

Speeding up IT Migrations with GenAI – Key Learnings from Translating SQL Queries

Share This Article

Many businesses face a critical challenge when adopting new technologies: dealing with old, outdated code. This code is often difficult to maintain, and rewriting it in a new technology is usually both time-consuming and resource-intensive. With the advent of Generative AI, old code can now be translated from an outdated language to a more modern and performant one, allowing businesses to migrate code across stacks with minimal resources. In this article, we explore three approaches leveraging generative AI to migrate old SQL queries to Python code, as described in the following video.

Approach #1 - SQL to SPARK SQL

As a first approach, we attempted to migrate SQL code from an Oracle database to SPARK SQL code. The GPT3.5 model, when used with the right prompts, provided good results for simple SQL queries with little human intervention needed. However, it struggled with complex queries, such as recursive common table expressions, and it showed limitations on the length of code it could handle. When implementing the newer GPT4 model, the quality of results improved significantly, but there was still a limitation on the length of code that could be translated.

Approach #2 - SQL to PySpark

In a second attempt, we focused on translating SQL code to PySpark, as Python provides more flexibility in finding solutions. Similarly to the first approach, the model performed well with simple queries, but struggled with more complex ones, again due to limitations on the context length.

It is worth noting that some important considerations remain with such an approach, such as the risk of exceeding the allowed limit, waiting time for model responses, and the cost of usage. Yet, we can expect future releases of large language models to handle longer queries as their context windows increase.

Approach #3 - SQL to PySpark (Divide & Conquer)

In a third attempt, we decided to break down large, complex SQL queries into smaller parts using abstract syntax trees and build larger queries from the partial results. This approach provided greater transparency to the developer, and made it easier to trace and correct errors. It also prompted more strategic business questions, such as questioning the need to maintain a monolithic view built from 3,000 (!) lines of code.

Importantly, this approach was significantly faster because smaller queries could be processed using smaller and faster models. Yet, we were also faced with incorrect query translations due to inconsistent use of aliases in the code, which caused issues with column ambiguity in Spark. The model was able to partially handle this problem, occasionally making minor mistakes that could be fixed with followup prompts focusing on fixing the subquery. Interestingly, asking GPT3.5 several times increased the chances of obtaining a good enough solution compared to waiting for a GPT4 response.

 

In summary, these different GenAI approaches can provide significant support for developers struggling with the migration of legacy code. The preliminary results are extremely promising, with the resulting code allowing us to reproduce the logic embedded in SQL in a simple way. Besides the technological advancements, we should emphasize the need of having a human in the loop, to cater for the highlighted limitations, especially for the most complex queries.

Ready to know more? Check out our advanced demo!

Interested in learning more about query translation?

Contact us

Want to receive updates from us?

agree_checkbox

By subscribing, you consent to Unit8 storing and processing the data provided above in order to provide you with the requested content. For more information, please review our Privacy Policy.

Our newsletter features industry news, the latest case studies, and future Unit8 events.

close

This page is only available in english