KazKN

Posted on Jun 10

Vestiaire Collective Scraper: 7 Checks I Built After Reading 11,263 Reviews

#webscraping #ecommerce #automation #data

The first version of my Vestiaire Collective scraper would have been boring.

Search page. Product title. Price. URL.

That is how most marketplace scrapers start.

Then I read 11,263 public App Store and Google Play reviews, plus a Trustpilot sample, and the product brief changed.

People were not only talking about clothes, bags, or luxury resale. They were talking about uncertainty:

"seller"
"sold"
"fake"
"refund"
"fees"
"condition"
"customer service"
"money back"
"doesn't work"

That language points to a different kind of scraper.

Not just a product-card scraper.

A marketplace intelligence scraper.

So I built the Vestiaire Collective Smart Scraper on Apify around the checks that help reduce uncertainty: sold items, seller countries, price tracking, item condition, product details, and duplicate or suspicious listing signals.

Disclosure: this article contains affiliate links to my Apify Actor.

The short version

If you are scraping a resale marketplace, title and price are not enough.

Here is the checklist I ended up building around:

Track live listings and sold items separately.
Separate market country from seller country.
Extract condition and condition source.
Add precision filters for exact models.
Enrich product details beyond the search card.
Track price changes, not only current price.
Surface duplicate/risk signals without accusing anyone.

That checklist is more useful than another "how to scrape product cards" tutorial.

Reviews are product research

A normal ecommerce scrape might output this:

{
  "title": "Designer bag",
  "price": 1200,
  "url": "https://..."
}

That works for a simple catalog.

Vestiaire Collective is not a simple catalog.

It is a resale marketplace where the useful questions are more specific:

Is the item still available or already sold?
Where is the seller located?
Is the condition clearly exposed?
Is the product page richer than the search result?
Did the price change since the last run?
Does the same item appear in several market pages?
Are similar listings clustered in a way that deserves manual review?

The reviews made this obvious.

When users talk about refund, fees, commission, and money back, they are asking for better price context.

When they talk about fake, authentication, photos, description, and condition, they are asking for better product and risk context.

When they talk about seller, buyer, sold, and selling, they are asking for marketplace intelligence.

The 7 checks

1. Live and sold items are different datasets

Active listings show supply.

Sold listings show demand.

If you only scrape active listings, you miss the part of the market that actually moved.

That is why the Actor supports includeSoldItems:

{
  "searchTerms": ["chanel classic flap"],
  "countries": ["FR", "IT", "DE", "GB"],
  "includeSoldItems": true,
  "maxListings": 100,
  "maxDatasetRecords": 150,
  "maxPagesPerCountry": 2
}

For resale research, sold signals are often more valuable than active prices.

2. Market country is not seller country

This caused confusion during testing, so it is worth spelling out.

country is the Vestiaire market page searched.

sellerCountry is where the seller or item appears to be located in the public listing data.

If you want to compare market pages, use countries:

{
  "countries": ["FR", "IT", "DE", "GB"]
}

If you only want sellers from France, search broadly and filter narrowly:

{
  "countries": ["ALL"],
  "sellerCountries": ["FR"]
}

That distinction matters because a user might discover the same relevant item from several market locales.

The seller location is often the real constraint.

3. Condition has to be visible in the output

Review language around condition, description, photos, and not as described came up too often to ignore.

The scraper should not hide condition in an unstructured blob.

It should expose it as a field:

{
  "condition": "Very good condition",
  "conditionSource": "vestiaire"
}

And it should let users filter before collecting too much noise:

{
  "itemConditions": ["3", "4"]
}

In this Actor:

3 means very good condition.
4 means good condition.

4. Broad search terms create messy datasets

A query like chanel classic flap can return nearby but irrelevant results.

That is not a scraper bug. It is how marketplace search behaves.

So I added a precision filter:

{
  "searchTerms": ["chanel classic flap"],
  "requiredKeywords": ["classic flap"]
}

This lets the user encode the model they actually care about instead of cleaning the dataset manually afterward.

5. Search cards are not enough

Search results are useful for discovery.

Product detail pages are useful for decisions.

That is why includeDetails exists.

It can add richer public product information when available, such as title, brand, category, materials, dimensions, item condition, images, and product page fields.

Use it when quality matters more than speed:

{
  "includeDetails": true
}

Leave it off for quick market scans.

6. Price history beats one-off price extraction

A one-time price is a snapshot.

A repeated scrape becomes monitoring.

The Actor can keep tracking state between runs and build public price history for known listings:

{
  "listingId": "67486244",
  "priceHistory": [
    { "price": 1200, "currency": "EUR", "observedAt": "2026-06-01T12:00:00.000Z" },
    { "price": 1000, "currency": "EUR", "observedAt": "2026-06-08T12:00:00.000Z" }
  ],
  "displayStatus": "Available"
}

That opens up more useful workflows:

price-drop monitoring;
deal alerts;
sell-through research;
recurring watchlists;
cross-country price comparison.

7. Risk signals should stay conservative

Reviews use strong words: fake, scam, avoid, fraud, stolen.

A scraper should not turn those emotions into accusations.

It should surface review queues.

That is why duplicate and suspicious listing outputs are framed as signals:

{
  "recordType": "risk_signal",
  "signalType": "similar_listing_cluster",
  "confidence": "review_required",
  "listingIds": ["..."],
  "sharedFields": ["brand", "title", "image", "priceBand"]
}

This is not proof.

It is a shortlist for manual inspection.

That is the responsible level for a luxury resale workflow.

A practical run I would use

Here is a deal-finding run:

{
  "searchTerms": ["chanel classic flap"],
  "countries": ["ALL"],
  "sellerCountries": ["ALL"],
  "itemConditions": ["3", "4"],
  "requiredKeywords": ["classic flap"],
  "maxListings": 50,
  "maxDatasetRecords": 80,
  "maxPagesPerCountry": 2,
  "includeDetails": true,
  "includeSellerInfo": true,
  "includeDuplicateSignals": true,
  "includeSoldItems": false
}

Why countries: ["ALL"]?

Because you do not always know which market locale will expose the relevant listing.

Why sellerCountries: ["ALL"]?

Because for deal discovery, I usually want the market first, then I narrow by seller country after looking at the data.

If I only want French sellers, I change one line:

{
  "sellerCountries": ["FR"]
}

That turns a broad market scan into a seller-location scan.

The output should be readable

The useful row is not only the product.

It is the product plus the context around it:

{
  "recordType": "listing",
  "displayStatus": "Available",
  "isSold": false,
  "listingId": "67486244",
  "brand": "Chanel",
  "title": "Classic flap bag",
  "price": 1000,
  "currency": "EUR",
  "country": "FR",
  "sellerCountry": "FR",
  "condition": "Very good condition",
  "conditionSource": "vestiaire",
  "url": "https://fr.vestiairecollective.com/..."
}

That is the difference between scraping and workflow design.

When I would use this Actor

This is the checklist I would use to decide whether the Actor fits a job:

You want to compare active listings and sold signals.
You need seller-country filtering.
You want to monitor price changes over time.
You care about item condition.
You want product details beyond search cards.
You want duplicate or suspicious listing review queues.
You want JSON, CSV, Excel, or API export from Apify.
You want scheduled runs without maintaining your own scraper server.

The Vestiaire Collective Smart Scraper searches all 70 supported Vestiaire market countries by default when countries is empty or set to ALL, but you can restrict countries when you need smaller runs.

The goal is not to force one workflow.

The goal is to let users define their own risk model.

What it does not claim

This part matters.

The Actor does not prove that an item is authentic.

It does not accuse sellers.

It does not bypass private areas.

It works with public Vestiaire Collective data and turns public marketplace signals into structured rows for research, monitoring, and manual review.

A scraper should not invent certainty.

It should make uncertainty easier to inspect.

My takeaway

Reading the reviews changed what I built.

The useful product was not:

"Give me title, price, URL."

It was:

"Help me decide what deserves attention."

That is why the Actor is built around sold items, seller intelligence, price tracking, condition filters, product details, and duplicate signals.

If you were building resale marketplace intelligence, what would you track next?

Seller tenure, fee-adjusted net price, image similarity, shipping country, sold-date reliability, or something else?

DEV Community