Squareboat

ESG Reporting: How LLMs are Powering the Next-Gen Knowledge Assistant

Feb 19 6 mins read

— Gaurav Gupta

Sustainability reporting is no longer a "nice to have"; it's a critical business imperative. Driven by increasing environmental concerns, stringent regulations, and demanding stakeholders, companies worldwide are scrambling to get their ESG (Environmental, Social, and Governance) reporting right. The International Financial Reporting Standards (IFRS) Foundation has stepped up, introducing comprehensive sustainability-related disclosure standards. But, let's face it: navigating these complex and ever-evolving standards can feel like trying to solve a Rubik's Cube blindfolded.

That's where the power of AI, specifically Large Language Models (LLMs), comes into play. This blog post delves into our research (inspired by a Squareboat's Project on ESG Compliance reporting) that tackles the challenge head-on, paving the way for a new era of domain-specific knowledge assistants in sustainability reporting. Our team has been at the forefront of applying LLMs to real-world business challenges, and sustainability reporting is a particularly ripe area for innovation.

The Challenge: A Knowledge Gap

Imagine you're tasked with preparing an IFRS-compliant sustainability report. Where do you even begin? The standards are intricate, constantly being updated, and interpreting them requires deep domain expertise. We've found that many organizations struggle with the sheer volume of information and the need to stay current with the latest pronouncements. Compounding this issue is a significant lack of readily available, high-quality question-answering systems and datasets specifically tailored to IFRS sustainability reporting. This absence has severely hampered the development of AI-powered tools that could genuinely assist companies in this crucial area. We've seen firsthand how this knowledge gap leads to inefficiencies, increased costs, and even potential compliance issues.

The Solution: A Custom-Built Approach Using LLMs

We recognized this gap and embarked on a project to create a solution: a domain-specific knowledge assistant powered by LLMs. The core of our approach hinges on two key innovations, which we believe are critical for any successful LLM-based application in this space:

A Synthetically Generated, High-Quality Dataset: We crafted a novel question-answer dataset of 1,063 diverse question-answer pairs based on IFRS sustainability standards. This wasn't just a random collection of questions; it was meticulously created using a novel generation and evaluation pipeline leveraging the capabilities of LLMs. Techniques like chain-of-thought reasoning and few-shot prompting were employed to ensure the dataset covered a wide spectrum of potential user queries. The dataset is designed to address a wide variety of user queries in sustainability reporting and achieves an average score of 8.16 out of 10 across metrics like Faithfulness, Relevance and Domain specificity. We focused on creating questions that reflect the nuanced challenges faced by sustainability professionals.
Two Custom Architectures for Question-Answering: We didn't just stop at the dataset. We developed two distinct architectures optimized for question-answering in the sustainability reporting domain. Our experience has shown that a one-size-fits-all approach simply doesn't work when dealing with the complexities of IFRS.
- Retrieval Augmented Generation (RAG) Pipeline: This architecture combines the power of information retrieval with the generative capabilities of LLMs. It first retrieves relevant documents or passages from a knowledge base and then uses the LLM to generate an answer based on the retrieved information. This approach is particularly useful when the answer requires referencing specific sections of the IFRS standards.
- Fully LLM-Based Pipeline: This architecture relies solely on a fine-tuned LLM to answer questions, without explicitly retrieving external information. This is ideal for questions that require a more conceptual understanding of the standards.

Diving Deep: Dataset Generation and Evaluation

The creation of the synthetic dataset was a critical step. We employed a sophisticated pipeline involving multiple stages, drawing on best practices from the field of data science and natural language processing:

Data Collection and Preparation: Gathering and structuring the necessary IFRS sustainability standards documentation. This involved parsing PDF documents and extracting relevant textual information. Prompts for PDF parsing were designed to maintain the contextual integrity of the extracted data, a crucial step often overlooked in similar projects. We also incorporated expert reviews of the extracted content to ensure accuracy.
QA Design: Defining the structure and types of questions to be generated. We focused on creating question structures suitable for few-shot prompting during QA generation. We also considered the different types of questions that sustainability professionals typically ask, ranging from simple factual inquiries to more complex scenario-based questions.
QA Generation: Using LLMs with techniques like few-shot prompting and chain-of-thought reasoning to generate the question-answer pairs. The prompt engineering was carefully designed to elicit high-quality and diverse questions. We experimented extensively with different prompting strategies to maximize the quality of the generated data.
QA Evaluation: Implementing a rigorous evaluation framework to assess the quality of the generated data, focusing on metrics like faithfulness, relevance, and domain specificity. This involved custom LLM-based evaluation metrics for robust quality control. We leveraged LLMs to evaluate the synthetically generated QA pairs using carefully crafted prompts that focused on dimensions like relevance and correctness.
Post-Processing: Refining the dataset through filtering and cleaning to ensure high quality. This involved implementing functions to improve the quality of the question-answer pairs by removing incomplete data. We also performed manual reviews of the dataset to identify and correct any errors or inconsistencies.

The result? A dataset with an average score of 8.16 out of 10 across these critical metrics. This ensures the assistant is trained on reliable and accurate information. We also analyzed question embeddings to visualize the diversity and coverage of the generated questions within the IFRS knowledge space. This analysis helped us identify any gaps in the dataset and ensure that it covers all relevant aspects of IFRS sustainability reporting.

The Architectures: RAG vs. LLM-Based

Let's break down the two custom architectures:

1. Retrieval Augmented Generation (RAG) Pipeline

RAG is a powerful approach when dealing with large volumes of information. Based on our experience, RAG significantly enhances the accuracy and reliability of the LLM by grounding its responses in verifiable sources. Here's how it works:

User Query: You ask a question about IFRS sustainability reporting.
Retrieval: The system uses information retrieval techniques (e.g., vector search using FAISS, keyword search) to find the most relevant documents or passages from the IFRS standards. We experimented with various retrieval techniques to optimize the retrieval of relevant context for question answering. Our research indicated that hybrid approaches, combining vector search with keyword search, tend to yield the best results.
Augmentation: The retrieved information is combined with the original query.
Generation: The augmented query is fed into an LLM, which generates an answer based on the retrieved context.

2. Fully LLM-Based Pipeline

In this architecture, the LLM is the star of the show. This approach requires a highly fine-tuned LLM with a deep understanding of the IFRS standards.

User Query: You ask your question.
Industry Classification: An industry classifier is implemented to understand the context of the query and determine whether it relates to a single industry or multiple industries. This classification is crucial for tailoring the LLM's response. We've found that industry-specific knowledge is often essential for interpreting IFRS standards correctly.
LLM Inference: The query and the industry classification are fed to a domain-specific fine-tuned LLM (using Low-Rank Adaptation (LoRA) for efficient fine-tuning), which then generates an answer based on its learned knowledge.

The Results: Performance that Speaks Volumes

We rigorously tested these systems using multiple-choice questions and real-world case studies. Our testing methodology was designed to simulate the types of questions that sustainability professionals would encounter in their daily work. Here's what we found:

RAG Pipeline Accuracy: 85.32% on single-industry questions and 72.15% on cross-industry questions. This outperformed the baseline approach by 4.67 and 19.21 percentage points, respectively.
LLM-Based Pipeline Accuracy: A staggering 93.45% on single-industry questions and 80.30% on cross-industry questions – an improvement of 12.80 and 27.36 percentage points over the baseline.

These results clearly demonstrate the superiority of both custom architectures over traditional methods. The LLM-based pipeline, in particular, showcased remarkable accuracy, highlighting the power of fine-tuning. The inclusion of an industry classification component significantly improved the handling of complex, multi-industry queries. This underscores the importance of tailoring AI solutions to specific domain needs.

Key Takeaways and Implications

This research has significant implications for the future of sustainability reporting:

AI-Powered Assistance: LLMs can be effectively leveraged to build domain-specific knowledge assistants that provide accurate and timely guidance on complex reporting standards.
Dataset Importance: The creation of high-quality, domain-specific datasets is crucial for training and evaluating these AI systems.
Architectural Innovation: Custom architectures, like the RAG and LLM-based pipelines, can significantly outperform baseline approaches.
Improved Compliance: By providing readily accessible knowledge and guidance, these AI assistants can help companies improve their compliance with IFRS sustainability standards.
Reduced Costs: Automating aspects of sustainability reporting can significantly reduce costs associated with manual research and compliance efforts.

The Future is Intelligent

Our work demonstrates the transformative potential of LLMs in sustainability reporting. As these technologies continue to evolve, expect to see even more sophisticated AI-powered tools that empower businesses to navigate the complexities of ESG reporting and drive sustainable business practices. It will be interesting to see how this nascent sector evolves. We are committed to continuing our research and development in this area, with a focus on creating even more powerful and user-friendly AI solutions for sustainability reporting.

What are your thoughts on this approach? How do you envision AI transforming sustainability reporting in the coming years? Share your insights in the comments below!

Discover more about our innovative project.

Got an idea brewing? We’d love to hear it!

Squareboat Artificial Intelligence AI ESG Sustainability Reporting IFRS Compliance

AI in Transportation: Benefits, Use Cases, and Real Examples

Gaurav Gupta | Apr 04, 2025

ArtificialIntelligence, Problem Solving Agents

Problem Solving Agents in Artificial Intelligence

Gaurav Gupta | Apr 01, 2025

AI, Generative AI

What is the Main Goal of Generative AI?

Gaurav Gupta | Mar 12, 2025