Artificial intelligence (AI) has rapidly changed how we live and work. Still, the challenge of AI data bias has come to the forefront. As we head towards a Web3 future, it is only natural that we will see new innovative products, solutions and services that use both Web3 and AI in concert. And, while some commentators maintain that decentralized technologies can be the answer to data bias, that couldn’t be further from the truth.
The Web3 market size is still relatively small and difficult to quantify, as the Web3 ecosystem is still in its early stages of development and the exact definition of Web3 is still evolving. While the market size in 2021 was estimated to have been close to $2 billion, various analysts and research firms have reported an expected compound annual growth rate (CAGR) of approximately 45%, which when combined with the rapid growth in Web3 solutions and consumer adoption puts the Web3 market on a course to be worth around $80 billion by 2030.
While it is growing rapidly, the current state of the industry combined with other tech industry factors is why bias in AI data is on the wrong path.
The link between bias, quality and volume
AI systems rely on large amounts of high-quality data to train their algorithms. OpenAI’s GPT-3, which includes the ChatGPT model, was trained on a massive amount of high-quality data. The exact amount of data used for training has not been disclosed by OpenAI, but it is estimated to be on the order of hundreds of billions of words or more.
That data was filtered and preprocessed to ensure that it was of high quality and relevant to the task of language generation. OpenAI used advanced machine learning (ML) techniques such as transformers to train the model on this large dataset, allowing it to learn patterns and relationships between words and phrases and to generate high-quality text.
The quality of AI training data has a significant impact on the performance of an ML model, and the size of the dataset can also be a critical factor in determining the model’s ability to generalize to new data and tasks. But, it is also true that both quality and volume have a significant impact on data bias.
Unique risk of bias
Bias in AI is an important issue as it can lead to unfair, discriminatory and harmful outcomes in areas such as employment, credit, housing, and criminal justice, among others.
In 2018, Amazon was forced to scrap an AI recruiting tool that showed bias against women. The tool was trained on resumes submitted to Amazon over a 10-year period, which included predominantly male candidates, leading the AI to downrate resumes containing words like “female” and “woman.”
And in 2019, researchers found that a commercially available AI algorithm used to predict patient outcomes was biased against Black patients. The algorithm was trained on predominantly white patient data, leading it to have a higher false positive rate for Black patients.
The decentralized nature of Web3 solutions combined with AI poses a unique risk of bringing bias. The quality and availability of data in this environment can be a challenge, making it difficult to train AI algorithms accurately, not just because of the lack of Web3 solutions in use, but because of the population that is in a position to use them.
We can draw a parallel from the genomic data collected by companies like 23andMe, which is biased against poor and marginalized communities. The cost, availability and target marketing of DNA testing services such as 23andMe limits access to these services for individuals from low-income communities or those living in a region the service doesn’t operate in, which tends to be poorer, less developed countries.
As a result, the data collected by these companies may not accurately reflect the genomic diversity of the wider population, leading to potential biases in genetic research and the development of healthcare and medicine.
And that leads us to another reason that Web3 increases AI data bias.
Industry bias and the focus on ethics
The lack of diversity in the Web3 startup industry is a major concern. As of 2022, women hold 26.7% of technology jobs. Of those, 56% are women of color. Executive positions in tech have an even lower representation of women.
In Web3, that imbalance is exacerbated. According to various analysts, fewer than 5% of Web3 startups have a female founder. This lack of diversity means that there is a strong likelihood of AI data bias being unconsciously ignored as an issue by male and Caucasian founders.
To overcome these challenges, the Web3 industry must prioritize diversity and inclusiveness in both its data sources and its teams. Furthermore, the industry needs to change the story of why diversity, equality and inclusion are necessary.
From a financial and scalability perspective, products and services designed through differing perspectives are more likely to work for billions of customers rather than millions, making those startups with diverse teams more likely to have high returns and global scale capabilities. The Web3 industry must also focus on data quality and accuracy, ensuring that the data used to train AI algorithms is free from bias.
Can Web3 hold the answer to AI data bias?
One solution to these challenges is the development of decentralized data marketplaces that allow for the secure, transparent exchange of data between individuals and organizations. This can help mitigate the risk of biased data, as it allows for a wider range of data to be used in training AI algorithms. In addition, blockchain technology can be used to ensure the transparency and accuracy of data so that algorithms are not biased.
But, ultimately, we will face the significant challenge of finding broad data sources for many years until Web3 solutions are being used by a mainstream audience.
While Web3 and blockchain continue to feature in mainstream news, such products and services are most likely to appeal to people in the startup and tech communities — which we know to lack diversity but which is also a relatively small slice of the global pie.
It is hard to estimate the percentage of the world’s population that work in startups. In recent years, the industry has created approximately three million jobs in the U.S. Scaling that against the total U.S. population — and not taking into account the jobs lost — the tech industry is not remotely representative of working-age citizens.
Until Web3 solutions become more mainstream and broaden their appeal and usage beyond those that have an inherent interest in tech and become affordable and accessible enough to a broader population, access to high-quality data at sufficient volumes to train AI systems will remain a significant hurdle. The industry must take steps to address this issue now.
Alexandra Karpova is head of marketing at Lumerin.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!
Read More From DataDecisionMakers