An Intelligent Framework for Automated Social Media Content Creation Using Textual and Visual Cues

Mr. R. Ganeshmurthi; Mr. P Ramesh; Mrs. R. Krishna Lakshmi; Dr. C. Sathish Kumar; Mrs. S. Samundeeswari

PDF

Published: Aug 14, 2025

Keywords:

Deep Learning, Social Media Automation, Multimodal AI, Content Generation, Transformer Models, Engagement Optimization.

Mr. R. Ganeshmurthi

Assistant professor, Department of Computer Science and Applications, SRM Institute of Science and Technology (FSH), Ramapuram, Chennai, India.

Mr. P Ramesh

Assistant professor, Department of Computer Science and Applications, SRM Institute of Science and Technology (FSH), Ramapuram, Chennai, India.

Mrs. R. Krishna Lakshmi

Assistant professor, Department of Computer Science and Applications, SRM Institute of Science and Technology (FSH), Ramapuram, Chennai, India.

Dr. C. Sathish Kumar

Assistant professor, Department of Computer Science and Applications, SRM Institute of Science and Technology (FSH), Ramapuram, Chennai, India.

Mrs. S. Samundeeswari

Assistant professor, Department of Computer Science and Applications, SRM Institute of Science and Technology (FSH), Ramapuram, Chennai, India.

Abstract

In the age of digital communication, social media has evolved into a powerful platform for branding, information dissemination, and public engagement. However, consistently generating engaging, personalized, and context-aware content remains a significant challenge. This study presents a deep learning-based, AI-driven multimodal approach for automated social media content generation that leverages textual, visual, and semantic signals to produce high-quality, platform-optimized posts. The objective is to enable scalable, human-like content creation that adapts to different audiences, topics, and engagement goals. The system is trained on a diverse dataset comprising over 1.2 million social media posts from platforms like Twitter, Instagram, and LinkedIn, including associated images, hashtags, captions, and engagement metrics. Posts are categorized across domains such as e-commerce, public health, education, and entertainment. Each data point is enriched with metadata such as post timing, sentiment, and audience interaction level. Technologically, the model employs a multimodal architecture that combines Transformer-based NLP models (such as BERT and GPT), Convolutional Neural Networks (CNNs) for visual analysis, and attention-based fusion mechanisms to align textual and visual inputs. A content planner module ensures contextual relevance, while a reinforcement learning layer optimizes for engagement metrics like likes, shares, and comments. Experimental results demonstrate that the AI-generated content not only maintains linguistic fluency and visual relevance but also outperforms baseline models in user engagement by up to 27%. This research offers a scalable solution for digital marketers, content strategists, and public agencies aiming to automate high-quality content creation while retaining personalization and contextual awareness.

Issue

Vol. 24 No. 01 (2025)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details