13 April 2023

Assessing social data as real-world data: its benefits, its risks, its future

Private: Sally Waddington

As researchers and practitioners increasingly seek information beyond the traditional boundaries of randomised control trials, interest in data collected from real-world situations has grown rapidly. Most recently, the growth of social media has opened new sources of information, known as social data.

This piece discusses going beyond the norm of using real-world data (RWD) in clinical research and practice, to understand why excitement has grown around the potential value of social data. We examine the benefits and risks of using social data and make some recommendations to help companies working with social data improve their decision-making.

What is RWD?

Data collected outside of the controlled environment of clinical trials is known as RWD, since it’s derived from ‘real-world’ situations, such as health surveys, electronic health records and registries1. The data collected can be used by a range of stakeholders, from policy makers and regulators to health payers2.

The main use of RWD is to enhance clinical research and bridge the gap between clinical research and practice3 . Technological developments, including the wide usage of the internet and development of apps have caused a growing awareness of the limitations of traditional randomised control trials (RCTs), such as high costs, time demands and generalisability. Promisingly, these technological advancements can provide insights from real-world situations, on a much larger scale, thus consolidating the RWD pool to further bridge the gap between RCT and clinical research.

Why does it matter?

The value of RWD lies in its ability to inform real-world evidence (RWE), which can mitigate the limitations of the study design of RCT to inform real-world practice, such as long-term safety and effectiveness4. As a result, RWD is used in early development and throughout the product cycle, including:

  • Discovering unmet needs in real-world settings e.g. illness burden.
  • Facilitating hypothesis generation and clinical trial feasibility4.
  • Providing information about the long-term safety of registered drugs.

Harnessing social media

Data collected from social media and other platforms – such as Twitter, Facebook and chat forums – is known as social data (SD). It differs from patient reported outcomes (PROs), which are used in clinical trials and as sources of RWD; for example, in patient registries.

Unlike PROs, social data can capture spontaneous, unstructured patient-generated health data from patients and/or their caregivers5. This can inform both trial and clinical research as a source of RWD.

Promisingly, NICE already considers patient-generated health data from social data as a form of RWD. One example is the ZOE app, which allows individuals to record personal data to track the ongoing and long-term effects of COVID-19 and other conditions6.

Making data useful

However, RWD is typically only used to inform RWE once a hypothesis has been created to test against the RWD. Specifically, analysts need to fully understand its biases and limitations, and the patient population it represents.

Any social data needs to be considered trustworthy and reliable before it can become RWE. However, controls placed on self-reported data to achieve this can introduce fresh biases, which can limit the extent to which patient-generated health data derived from social data can be used as RWE. Tackling this issue requires a new, standardised framework which can ensure that the collection of social data is consistent, trustworthy and credible.

Assessing the benefits…

Social data is most useful in pharmaceutical and healthcare research, where it can offer unsolicited and real-time insights into patients’ experiences, behaviour, and thoughts regarding their illness and treatments, to uncover unmet needs. Historic social data can also be accessed to assess trends and changes over time.

Equally, some experiences are difficult to communicate; for example, caregivers’ perspectives on diseases where patients cannot share their experiences directly7. Social data adds value by enabling these voices to be heard.

Encouragingly, the FDA has published guidance on Patient-Focussed Drug Development which assesses the potential benefits and limitations of social data7. The FDA is one of many organisations and communities committed to considering social data as RWD, which demonstrates the global need and interest in this data source7.

…And the risks

Despite the potential offered by social data, it remains an emerging field and, compared to more traditional sources, has less-established methodologies for data collection and analysis. Without any standardised framework, social data may be inconsistent; as a result, caution is needed when using it as RWD, particularly relating to pharmacovigilance and the long-term safety of products.

Additionally, given that social media platforms can potentially open up such a high volume of adverse events, social data could burden pharmaceutical companies without providing any assurance that the episodes reported are real, hence a risk analysis needs to be conducted to understand the potential impact.

However, research shows that such cases tend to relate to milder events. Whilst it is critical mild events are reported, the high volume of mild cases may just reflect nuances in social data. Therefore, its validity shouldn’t necessarily be questioned simply because it doesn’t correspond with existing pharmacovigilance sources8.

An exciting future

At Vox.Bio, we have extensive experience working with patient influencers and patient groups to capture and harness patient insights in a responsible and productive way. As such, we believe firmly that social data is a powerful tool to gain first-hand insights from real patients in real-time, especially in early product development. In our experience, social data also can and should be used to inform decision-making across the product life cycle and could be used as RWE in the future.

We are confident that, over time, the evidence base from social data will only strengthen further to the extent that it will become commonplace for healthcare and pharmaceutical companies to use it to inform research and development of products. If you are considering leveraging the potential of social data and want to get ahead of the curve, please reach out to us at Vox.Bio. We stand ready to help and look forward to hearing from you.


  1. Value In Health Journal
  2. Veradigm
  3. B M C Medical Research 
  4. National Library Of Medicine 
  5. BMC
  6. National Institute for Health and Care Excellence
  7. Pistoia Alliance
  8. B M C Medicine


On the 25th April 2023 at 16:00 BST we’ll be hosting a free webinar, discussing the role of the patient influencer in the future of healthcare, to join simply follow the instructions here.