automation – Using python to delete specified text from thousands of old blog posts
I’m a web editor that uses WordPress and my site has a bit of an annoying problem. We basically have tens of thousands of articles going back about ten years and we have to delete all of the images we posted in articles from around 2012 to 2018. The reason is that the then editor of the website had a bad habit of using Creative Commons images and not attributing them correctly so we’re now vulnerable to legal action. I batch deleted all the actual images from our media library but that still leaves random bits of image attribution/text sitting in old articles and it all looks a complete mess. It took me about a day just to go through one month of old articles to correct this.
Anyway, before I get forced to throw myself out a window and end the misery, I wondered if conceptually speaking it might be possible to write some python code to automate this process. Basically what is required is a program that can go through every article we published between 2012 and 2018, identify sections of text (all the image attributions start with “Credit:”) and then delete all this text. I’m a novice with python and I’ve just started thinking about this but I just wondered if anyone with more experience thinks this is at least possible. I honestly think it will take less time for me to learn python and do this than it will to manually go through each article deleting everything due to the volume of content there is on this site.