The role of unstructured data in AI and LLMs

Businesses rely on well-organised data, and therefore if models trained on unstructured data are not handled correctly, security may be compromised

Artificial intelligence (AI) and machine learning (ML) have made it easier for people without security expertise to create their data products. This is known as data mesh. However, most enterprises use structured data compared to unstructured data, as it is more difficult to work with. If these people don't train and manage the unstructured data properly, it could lead to misleading insights and even breaches of security.

Speaking with Data Centre Magazine, Mark Semenenko, Director of Solutions Architecture at Immuta discusses how he and his team are responsible for designing Immuta’s data security solutions ensuring they bridge the technology gap between a customer’s goals and their existing tech stack.

How did you come to work in the data industry? 

“I studied computer science at The University of Sheffield before joining IBM’s graduate scheme in their software group. During my time at IBM, I was responsible for IBM’s Smarter Cities offerings including the Intelligent Operations Center and Intelligent Video Analytics solutions. In combination, these solutions exemplify how advanced analytics and machine learning could deliver massive value, turning knowledge into actionable insights. 

“Continuing my journey into the world of data I joined Pentaho, an open-source big data integration platform where I gained an in-depth understanding of big data and the challenges, importance and value of ingesting, processing and analysing data at a massive scale. 

Shortly after, understanding the massive value in data and recognising the challenges organisations face in balancing data privacy and security with fast access to data, I moved into the data privacy space. In my current and previous roles, I have seen the same data security and privacy challenges cloned from on-premises appliances, to Hadoop clusters and now to the cloud.”

Can you tell us about the role of unstructured data in AI/LLMs and how they introduce new security considerations for organisations?   

“Training any machine learning or AI model requires data and lots of it. The more data the better, however, more data brings more risk and bigger challenges about managing who has access to what. With AI or LLMs, once the model is trained you cannot reverse engineer the data from the model, so it’s imperative to know what the model was trained on and be absolutely certain the data was accurate, unbiased and secure. 

“If you don’t, you risk your models being poisoned by malicious actors, becoming inherently biased by skewed data or breaching privacy regulations or contractual obligations if your LLM inadvertently reveals confidential information. With structured data, most organisations typically already have some form of role-based access control in place, however managing the same level of access on unstructured data, typically on block storage, is a blind spot for many organisations from two perspectives: who has access and what is being accessed.”

How has the proliferation of AI shifted priorities across the security market? 

“AI has such enormous potential but while security is critical to the success of AI, it can also be one of the main blockers. The unquestionable benefits of AI have thrown data security into stark relief as organisations race to reap these rewards and not be left behind. 

“Without solving the security challenges and providing complete access to all the required data in a timely fashion, organisations may find themselves in a quagmire of problems. Therefore, to overcome these obstacles, organisations have had to prioritise implementing a governance strategy and proper data security tools to deliver value from AI.”

Is there a need for government oversight/regulation?  

“In November last year, there were 95 disclosed data security incidents that resulted in 32 million breached records in Europe alone. With the value of data being so high and the risk to individuals being so massive, data needs to be adequately protected. 

“Data regulations don’t only protect us all as individuals, but they also provide the frameworks and guidelines for organisations to be able to use data legally, safely and ethically. However, the regulatory landscape is getting more complex by the day, and ultimately it is the flexibility of new regulations that will help decrease the compliance burden for businesses in an attempt to build a pro-growth regime.”

How are large organisations like Snowflake and Databricks solving the data security challenges of AI workloads? 

Snowflake and Databricks have both upgraded their tooling for data protection in parallel with the capability to access and process more data from a wider variety of sources. The tools that are provided can be leveraged to implement sophisticated and fine-grained access controls. The challenges faced by any large enterprise or national public sector body are one of scale and velocity in using this native functionality for thousands of tables, users, or both! 

“This is how The Immuta Data Security Platform delivers value to our customers on these platforms. Immuta natively integrates with Snowflake and Databricks to leverage their native functionality when simplifying and scaling data security for our customers.”

What do you think the future of data, AI and LLMs look like off the back of this? 

“The future of data that LLMs will deliver is allowing anyone who can ask a question to get insight from their data instantly without needing to build an OLAP cube, author a dashboard or write a SQL query, but only if robust, scalable, and automated data security is in place.”

******

For more insights into the world of Data Centre - check out the latest edition of Data Centre Magazine and be sure to follow us on LinkedIn & Twitter.

Other magazines that may be of interest - Mobile Magazine.

Please also check out our upcoming event - Sustainability LIVE Net Zero on 6 and 7 March 2024.  

******

BizClik is a global provider of B2B digital media platforms that cover Executive Communities for CEOs, CFOs, CMOs, Sustainability leaders, Procurement & Supply Chain leaders, Technology & AI leaders, Cyber leaders, FinTech & InsurTech leaders as well as covering industries such as Manufacturing, Mining, Energy, EV, Construction, Healthcare and Food.

BizClik – based in London, Dubai, and New York – offers services such as content creation, advertising & sponsorship solutions, webinars & events.

Share

Featured Articles

Hyperscale Data Centre Capacity Will Continue to Double

According to Synergy Research, hyperscale data centre capacity will keep doubling every four years, which includes key providers Google, Amazon & Microsoft

India's Adani JV Secures US$1.4bn to Build Data Centres

Adani Enterprises and EdgeConneX have teamed up to collaborate to spearhead India's data centre expansion to meet the needs of its growing population

Google Data Centres: Prioritising Technical Infrastructure

Tech giant Google commits to new million-dollar data centre sites in significant locations across the United States (US) and Europe to boost AI and cloud

African Data Centre Market Expected to Double by 2026

Iron Mountain’s Chris Pennington: Data Centre Sustainability

Schneider Electric: Powering the Sustainable Data Centre