False Flagging

Also known as: Malicious Flagging, Coordinated Flagging

False flagging refers to the submission of content reports on social media platforms made in bad faith — that is, flagging posts that do not actually violate platform guidelines in order to suppress, harass, or remove legitimate content. It can be organized and coordinated, with groups of users collectively reporting a target's content to trigger moderation action. False flagging represents an abuse of the flagging mechanism that disproportionately affects marginalized users, including disability advocates, LGBTQ+ content creators, and racial minority communities, whose content may be systematically targeted. Platform designs without safeguards against false flagging — such as detection of coordinated reporting patterns or penalties for repeated bad-faith reports — inadvertently enable this form of digital harassment and can contribute to the over-silencing of communities already underserved by content moderation systems.

Category: content moderation · platform governance · digital equity

Related: Content Moderation · Platform Governance · Procedural Fairness

Sources

https://doi.org/10.1145/3797820