Podcastfy

Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

Table of Contents

Setup

Firstly, please make sure you have installed the podcastfy module, its dependencies and associated API keys. See Setup.

Getting Started

[9]:
# Import necessary modules
from podcastfy.client import generate_podcast
/home/tobias/src/podcastfy-pypi/podcastfy/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

This is just a custom function we will use to embed audio in this Python notebook.

[ ]:
%pip install ipython
from IPython.display import Audio, display

def embed_audio(audio_file):
    """
    Embeds an audio file in the notebook, making it playable.

    Args:
            audio_file (str): Path to the audio file.
    """
    try:
            display(Audio(audio_file))
            print(f"Audio player embedded for: {audio_file}")
    except Exception as e:
            print(f"Error embedding audio: {str(e)}")

Generate a podcast from text

Single URL

This code demonstrates the process of generating a podcast from a single URL, in this case wikipedia’s page on “Podcast”: 1. Extract content from the URL 2. Generate a Q&A transcript from the extracted content 3. Convert the transcript to speech Text-to-Speech model 4. Save the generated audio file to data/audio

[12]:
audio_file = generate_podcast(urls=["https://en.wikipedia.org/wiki/Podcast"])
2024-10-05 09:50:03,308 - podcastfy.client - INFO - Processing 1 links
[("Welcome to Podcastfy - Your Personal GenAI Podcast! Uh, what's up everyone? You know, it's funny how we use this technology every day, but have you ever stopped to think about the history of podcasts?", "I know what you mean. It's like, they're just there, you know? We hit play and boom - instant entertainment or information. But, where did this all start?"), ('Well, get this: the word "podcast" is actually a mashup of "iPod" and "broadcast"! Ben Hammersley, a journalist, first used it back in 2004.', "Wow, 2004? That's way earlier than I would've guessed! But, weren't MP3 players around before that?"), ('Totally! In fact, there was a company, i2Go, that offered a service kinda like podcasting back in 2000. It let people download news to their MP3 players. They were onto something, but it fizzled out quickly.', 'So, if that was happening in 2000, what really made podcasts take off later?'), ('It was a perfect storm of tech advancements. Apple launched iTunes with podcast support in 2005, which made listening SO much easier. That, plus cheaper recording tech and the rise of smartphones - it all just exploded from there.', 'That makes a lot of sense. It\'s interesting, though, that you mentioned Apple was such a driving force. Weren\'t there legal battles over the whole "pod" terminology?'), ('Oh yeah, big time. Apple got pretty aggressive going after companies using "pod" in their names, even sending out cease and desist letters. They claimed people associated "pod" so strongly with the iPod that it fell under their trademark. I mean, they even tried to trademark "podcast" itself!', 'Wow, really? Seems like a bit of a stretch, but I guess they wanted to protect their brand. So, aside from straight-up talk shows, what other types of podcasts have become popular?'), ("Oh man, there's like, a whole universe of podcasts now! You've got fiction podcasts that are basically like audio dramas, complete with actors, sound effects, the works. There's also the enhanced podcasts that combine audio with slideshows - super cool for educational stuff. And then, you can't forget the video podcasts! It's wild how much it's evolved from those early days.", 'Yeah, it really is amazing. And it seems like podcasting is only getting bigger. I mean, look at how many podcasts and episodes there are now!'), ("For sure! And it's not just about listening anymore. Live shows are becoming huge, too! It's like a whole new way for creators to connect with audiences. Who knows what the future holds for podcasting, but I'm along for the ride!", 'Me too! Until next time on Podcastfy - Your Personal GenAI Podcast.')]
2024-10-05 09:51:06,711 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model
[13]:
# Embed the audio file generated from transcript
embed_audio(audio_file)
Audio player embedded for: ./data/audio/podcast_e1525fed48054896af5645c203138dca.mp3

It works but it does not sound that exceptionally great! The default backend utilizes OpenAI’s TTS model for speech generation. In the next example, we will utilize ElevenLabs model, which in my experience improves results dramatically.

Multiple URLs

Here, we take one step further and generate a podcast from multiple sources. 1. Podcastify’s own github readme file 3. A youtube video about Google’s NotebookLM going viral

[16]:
# Define multiple URLs to process
urls = [
    "https://github.com/souzatharsis/podcastfy/blob/main/README.md",
    "https://www.youtube.com/watch?v=jx2imp33glc"
]

# Generate podcast from multiple URLs
audio_file_multi = generate_podcast(
    urls=urls,
    tts_model="elevenlabs"
)
2024-10-05 10:06:04,940 - podcastfy.client - INFO - Processing 2 links
2024-10-05 10:07:25,914 - podcastfy.client - INFO - Podcast generated successfully using elevenlabs TTS model
[17]:
print(f"Podcast generated and saved as: {audio_file_multi}")

# Embed the generated audio file
embed_audio(audio_file_multi)
Podcast generated and saved as: ./data/audio/podcast_829a531a20334c949f76e077b846cc7f.mp3
Audio player embedded for: ./data/audio/podcast_829a531a20334c949f76e077b846cc7f.mp3

This AI-generated transcript is interesting for a couple of reasons:

  • Realism: The transcript demonstrates the ability of AI to generate realistic, conversational dialogue. It includes elements like filler words (“uh”, “umm”), casual language, and back-and-forth banter that mimic human conversation patterns.

  • Irony: There’s an ironic element in that the transcript presents AI-generated characters expressing concern about the implications of AI-generated content on their own (fictional) careers as podcasters.

  • Ethical and legal concerns: The characters discuss potential implications of this technology, including copyright issues, voice replication without consent, and the impact on human content creators. This reflects real-world debates surrounding AI-generated content.

  • Meta-commentary: The podcast is a an AI-generated content discussion about AI-generated content, specifically AI-created podcasts. This creates an intriguing layer of self-reference, as an AI-generated conversation is discussing the capabilities of AI to generate conversations.

However, this particular transcript did not pickup on my Podcastify’s content solely focusing on the youtube video. This may happen as the AI-Podcast hosts may pick a particular concept from one of the provided sources and develop a conversation around that. There is room for improvement in guiding the AI-Podcasts hosts to strike a good balance of content coverage among the provided input sources.

Generate transcript only

There is also the option to generate the transcript only from input urls. This would allow users to edit/process transcripts before further downstream audio generation.

[18]:
# Generate transcript only
transcript_file = generate_podcast(
    urls=["https://github.com/souzatharsis/podcastfy/blob/main/README.md"],
    transcript_only=True
)
2024-10-05 10:15:06,561 - podcastfy.client - INFO - Processing 1 links
2024-10-05 10:15:29,500 - podcastfy.client - INFO - Transcript generated successfully
Transcript generated and saved as: ./data/transcripts/transcript_f6ab3ee241444e999ed4d1142564b9fe.txt
First 20 characters of the transcript: <Person1> "Welcome t
[19]:

print(f"Transcript generated and saved as: {transcript_file}") # Read and print the first 20 characters from the transcript file with open(transcript_file, 'r') as file: transcript_content = file.read(100) print(f"First 100 characters of the transcript: {transcript_content}")
Transcript generated and saved as: ./data/transcripts/transcript_f6ab3ee241444e999ed4d1142564b9fe.txt
First 100 characters of the transcript: <Person1> "Welcome to Podcastfy - YOUR Personal GenAI Podcast! You know, the other day I was struggl

Generate audio from transcript

Users can also generate audio from a given transcript. Here, we generate a podcast from the previsouly generate transcript on wikipedia’s Artificial Intelligence page. This allows users to re-use previsouly generated transcripts or provide their own custom produced transcript for podcast generation.

[23]:
# Generate podcast from existing transcript file
audio_file_from_transcript = generate_podcast(
    transcript_file=transcript_file,
    tts_model="elevenlabs"
)
2024-10-05 10:28:37,745 - podcastfy.client - INFO - Using transcript file: ./data/transcripts/transcript_f6ab3ee241444e999ed4d1142564b9fe.txt
2024-10-05 10:30:17,300 - podcastfy.client - INFO - Podcast generated successfully using elevenlabs TTS model
[24]:
# Embed the audio file generated from transcript
embed_audio(audio_file_from_transcript)
Audio player embedded for: ./data/audio/podcast_c06620d918d4419884f9c7558a4a2cf1.mp3

Generate audio from PDF

One or many pdfs can be processed in the same way as urls by simply passing a corresponding file path.

[ ]:
audio_file_from_pdf = generate_podcast(urls="/data/pdf/s41598-024-58826-w.pdf")

This is a Scientific Reports about climate change in France. We have it pre-generated into our data directory. Let’s listen to the podcast:

[7]:
file_path = "./data/audio/Agro_paper.mp3"
# Embed the audio file generated from transcript
embed_audio(file_path)
Audio player embedded for: ./data/audio/Agro_paper.mp3

Generate podcast from images

Images can be provided as input to generate a podcast. This can be useful when users want to generate a podcast from images such as works of art, physical spaces, historical events, etc. One or many images can be provided as input. The following example generates a podcast from two images: Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu.

[11]:
# Generate podcast from input images
image_paths = [
    "./data/images/Senecio.jpeg",
    "./data/images/connection.jpg"
]

audio_file_from_images = generate_podcast(image_paths=image_paths)

print("Podcast generated from images:", audio_file_from_images)
[('"Welcome to PODCASTFY - Your Personal Generative AI Podcast. Today, we\'re diving into the vibrant world of abstract art! Buckle up!"', '"I\'m all ears! Abstract art can be so captivating, but also a bit puzzling sometimes. What kind of abstract pieces are we looking at today?"'), ('"Well, imagine a canvas bathed in warm, almost fiery, orange hues. On this canvas, we see a circular face, divided into sections like a carefully pieced-together puzzle. The eyes are striking - bright red dots that seem to stare right at you. It\'s geometric, yet full of emotion. That\'s our first piece."', '"Wow, I can practically feel the energy radiating from that description! It sounds like the artist used simple shapes and colors to create something incredibly powerful. What about the second piece?"'), ('"Ah, the second one is a whole other story! Imagine the same vibrant orange, but this time, it\'s like a wild dance of brushstrokes, a whirlwind of texture. The figure here is more abstract, with jagged lines, bold shapes, and a single blue eye peering out from the chaos. It\'s dynamic, almost chaotic, but undeniably captivating."', '"It\'s fascinating how both pieces use a similar color palette but evoke completely different feelings. The first one sounds almost serene in its geometric precision, while the second one sounds like it\'s bursting with raw energy. It really shows the range of abstract art, doesn\'t it?"'), ('"Absolutely! And that\'s the beauty of it, isn\'t it? Abstract art invites us to interpret, to feel, to connect with the emotions the artist is conveying through color, shape, and form. It\'s a conversation between the artist and the viewer, with no right or wrong answers."', '"I totally agree! It\'s like a visual puzzle that each person gets to solve in their own way. No wonder abstract art continues to fascinate and inspire people all over the world."'), ('"That\'s all the time we have for today. Thanks for tuning in to PODCASTFY. Until next time, keep exploring the fascinating world of art!"', 'Bye Bye!')]
2024-10-12 18:33:15,834 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model
Podcast generated from images: ./data/audio/podcast_9e4e617c7ab546ada4f103521a330468.mp3

Here is the generated podcast, which we have pre-saved in the data directory.

[3]:
# Embed the audio file generated from images
embed_audio("data/audio/abstract_art.mp3")
Audio player embedded for: ../../data/audio/abstract_art.mp3

Customization

Podcastfy offers a range of customization options to tailor your AI-generated podcasts. Whether you’re creating educational content, storytelling experiences, or anything in between, these configuration options allow you to fine-tune your podcast’s tone, length, and format. See Conversation Configuration for more details.

[2]:
# Example: In-depth Tech Debate Podcast

# Define a custom conversation config for a tech debate podcast
tech_debate_config = {
    "word_count": 4000,  # Longer content for in-depth discussions
    "conversation_style": ["analytical", "argumentative"],
    "roles_person1": "tech optimist",
    "roles_person2": "tech skeptic",
    "dialogue_structure": ["Topic Introduction", "Pro Arguments", "Con Arguments", "Rebuttal", "Audience Questions", "Conclusion"],
    "podcast_name": "Tech Crossroads",
    "podcast_tagline": "Where Innovation Meets Scrutiny",
    "output_language": "English",
    "engagement_techniques": ["statistics", "case studies", "ethical dilemmas"],
    "creativity": 0.3  # Lower creativity for more factual content
}

# Generate a tech debate podcast about artificial intelligence
tech_debate_podcast = generate_podcast(
    urls=["https://en.wikipedia.org/wiki/Artificial_intelligence",
          "https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence"],
    conversation_config=tech_debate_config,
    tts_model="openai"  # Using OpenAI for clear, neutral voices
)

print("Tech Debate Podcast generated:", tech_debate_podcast)

2024-10-10 02:19:01,046 - podcastfy.client - INFO - Processing 2 links
[('"Welcome to Tech Crossroads - Where Innovation Meets Scrutiny! Today, we\'re diving deep into the fascinating world of artificial intelligence, or AI as it\'s more commonly known. Uh, it\'s a field that\'s been making waves for decades, and now, it\'s really starting to impact our everyday lives in ways we never imagined."', '"I agree, AI is everywhere these days. From the smartphones in our pockets to the algorithms that curate our news feeds, it\'s becoming increasingly difficult to escape its influence. But while AI offers incredible potential, I can\'t help but feel a sense of unease about its rapid development. It\'s like we\'re opening Pandora\'s Box, and we\'re not entirely sure what we\'ll find inside."'), ('"I see your point. AI does raise some serious ethical concerns, and it\'s crucial that we address them proactively. But let\'s not forget the incredible benefits AI brings to the table. Think about the advancements in healthcare, where AI is helping doctors diagnose diseases earlier and more accurately. Or in transportation, where self-driving cars have the potential to reduce accidents and save lives."', '"Those are valid points, but I\'m still wary of the potential downsides. One of my biggest concerns is the issue of algorithmic bias. We\'ve already seen instances where AI systems have perpetuated existing societal biases, leading to discrimination against certain groups. For example, facial recognition algorithms have been shown to be less accurate for people with darker skin tones, which could have serious implications for law enforcement and security."'), ('"Interesting. You\'re right, algorithmic bias is a significant problem, and it\'s something that needs to be addressed head-on. The good news is that researchers are actively working on developing techniques to mitigate bias in AI systems. For instance, they\'re exploring ways to ensure that training data is more representative of diverse populations and that algorithms are designed to be more fair and equitable."', '"I\'m glad to hear that, but I\'m also concerned about the lack of transparency in many AI systems. Often, even the developers themselves don\'t fully understand how these complex algorithms work. This makes it difficult to identify and correct biases, and it raises questions about accountability when things go wrong."'), ('"Got it. Transparency is indeed crucial, and there\'s a growing movement towards developing explainable AI, where the decision-making processes of AI systems are more understandable to humans. This will not only help us identify and address biases but also build trust in AI technology."', '"Another concern I have is the potential for AI to exacerbate existing inequalities. As AI becomes more sophisticated, it could automate a wide range of jobs, potentially leading to mass unemployment and widening the gap between the rich and the poor."'), ('"I understand your concern about technological unemployment. It\'s a valid point, and it\'s something that policymakers need to consider seriously. However, history has shown that technological advancements often create new jobs, even as they displace old ones. The key is to ensure that workers have the skills and training they need to adapt to the changing job market."', '"That\'s true, but this time feels different. AI has the potential to automate not just manual labor but also cognitive tasks that were once thought to be the exclusive domain of humans. This could have a profound impact on the job market, and we need to be prepared for the challenges it presents."'), ('"You raise a valid point. The nature of work is undoubtedly changing, and we need to adapt our education and training systems to prepare people for the jobs of the future. This includes fostering skills such as critical thinking, creativity, and problem-solving, which are less likely to be automated."', '"Beyond the economic implications, I\'m also concerned about the potential for AI to be used for malicious purposes. Imagine AI-powered surveillance systems that track our every move or autonomous weapons that can kill without human intervention. These are terrifying possibilities that we need to guard against."'), ('"I agree, the potential for AI to be weaponized is a serious concern. That\'s why it\'s crucial that we develop international regulations and ethical guidelines for the development and use of AI, especially in sensitive areas like military applications."', '"I\'m glad to hear that, but I\'m not sure if regulations alone will be enough. We also need to foster a culture of responsible AI development, where ethics are considered from the very beginning of the design process."'), ('"Absolutely. We need to ensure that AI is developed and used in a way that benefits humanity as a whole, not just a select few. This requires a multi-faceted approach, involving researchers, policymakers, industry leaders, and the public."', '"One final thought: as AI becomes more powerful, it raises fundamental questions about what it means to be human. If machines can think, learn, and even create, what does that say about our own unique abilities and our place in the world?"'), ('"That\'s a profound question, and one that philosophers have been grappling with for centuries. As AI continues to evolve, it will undoubtedly challenge our understanding of ourselves and our relationship with technology. It\'s a journey that will require careful consideration, open dialogue, and a commitment to shaping a future where AI serves humanity, not the other way around."', 'Tchau!')]
2024-10-10 02:21:30,016 - podcastfy.client - INFO - Podcast generated successfully using openai TTS model
Tech Debate Podcast generated: ('./data/transcripts/transcript_7e84bb13b26f4ab78dda30d04d461838.txt', './data/audio/podcast_c8f53545fefd44569dbebd4fa739e2b9.mp3')
[8]:
file_path = "./data/audio/podcast_c8f53545fefd44569dbebd4fa739e2b9.mp3"
# Embed the audio file generated from transcript
embed_audio(file_path)
Audio player embedded for: ./data/audio/podcast_c8f53545fefd44569dbebd4fa739e2b9.mp3

Multilingual Support

Description of how to generate non-English content TBD. See Notes of Caution before starting to customize to avoid unexpected results. For now, here are a couple of audio examples:

French (fr)

Generates a podcast from about AgroClim website - French Government’s service unit that aims to study the climate and its impacts on agroecosystems.

[6]:
embed_audio("./data/audio/podcast_FR_AGRO.mp3")
Audio player embedded for: ./data/audio/podcast_FR_AGRO.mp3

Portuguese (pt-br)

Generates a podcast in Brazilian Portuguese from a news article on the most recent voting polls on Sao Paulo’s 2024 Elections.

[5]:
embed_audio("./data/audio/podcast_thatupiso_BR.mp3")
Audio player embedded for: ./data/audio/podcast_thatupiso_BR.mp3