Automatic Blog Posting for Medium using Naver Blog contents — Day 2

3 min readSep 10, 2020

“A motivational idea that makes me move forward at least a little everyday.”

Continuing from the requirement analysis, I am going to talk about the Proof of Concept. I’ve got several items to prove to see if I can actually implement the derived requirements.

These are the items :

Is it possible to crawl Naver Blog using Scrapy based on URL patterns in someone’s account? otherwise what options are left to access specific contents ?
Is it possible to process the crawled data into something organized so that I can save and reuse to make Medium API call?

Okay then, let’s hand one by one.

How to access Naver Blog contents that I want to crawl?

To achieve this object, I have come up with more items to check and tried to answer them as below.

Q1. Can I get a list of blogs contents in someone’s account?

→ Yes.

Q2. How can I get a list of blogs ?

→ Option 1) I can get partial list of blogs by loading HTML page in this url pattern https://blog.naver.com/PostList.nhn?blogId=elle81054. and then I can scrape a list of blogs from the table element with “blog2_list blog2_categorylist” class. But the problem here is that I can’t get a full list of blogs because the list is paginated by default. Even url doesn’t change for the paginated contents, which leaves me no other options than going for different approach.

→ Option 2) Using RSS Feed is an alternative solution to the option1.

Fortunately, I can get contents from RSS like this.

By accessing “rss.blog.naver.com/${accounteId}.xml

I can get contents metadata.

Basically, I can get some of hints to compose JSON schema for saving blog contents from the RSS feed xml file as followings:

Category
Title
Blog Link
Description
Publish Date

Q3. How are you going to crawl main contents from the Blog Link?

→ As I took a visit to a blog link that I got from the RSS link, I found out some of repetitive patterns in between HTML elements like this.

Since the only contents that I need to take from a blog post are texts and images in the right sequence, it seems that I can get those from the “se-main-container” classed Div element.

all the contents inside the se-main-container div tag are classed as “se-component”. and for image content, se-image is there, and for the text content, se-text is there, and for quoted texts, se-quotation is there.

In a nutshell, using those patterns, I will extract the necessary contents in the right order and right format.

then, what’ next?

I am going to figure out how to make a POST api for blog posting in Medium.

TO BE CONTINUED…

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by 김영석

251 Followers

281 Following

I love problem solving and hate repetition of tedious tasks. I like automating, streamlining, optimizing, things.

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Level Up Coding

Jacob Bennett

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Jan 7

260

How I Am Using a Lifetime 100% Free Server

Harendra

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Oct 26, 2024

170

Lists

Coding & Development

11 stories1033 saves

Predictive Modeling w/ Python

20 stories1857 saves

Practical Guides to Machine Learning

10 stories2225 saves

ChatGPT

21 stories991 saves

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

732

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

DataDrivenInvestor

Austin Starks

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Sep 15, 2024

242

I Wrote On LinkedIn for 100 Days. Now I Never Worry About Finding a Job.

Alexander Nguyen

I Wrote On LinkedIn for 100 Days. Now I Never Worry About Finding a Job.

Everyone is hiring.

Sep 21, 2024

973

CodeX

AI Rabbit

Goodbye Obsidian

Feb 6

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams