Skip to content

Conversation

@karashiiro
Copy link
Collaborator

@karashiiro karashiiro commented Jan 19, 2025

Implements support for custom rate-limiting strategies, allowing for changing the default library behavior of waiting until the current period has expired. This allows for significantly more flexibility in how rate-limiting is handled, and also enables library consumers to define their own implementations that do monitoring and/or logging when rate-limiting occurs by implementing RateLimitStrategy themselves.

This also adds a new implementation of this strategy that simply throws an error when being rate-limited:

import { Scraper, ErrorRateLimitStrategy } from "@the-convocation/twitter-scraper";

const scraper = new Scraper({
  rateLimitStrategy: new ErrorRateLimitStrategy(),
});

On pooling (tl;dr not yet but now you can DIY it)

One of the primary desired use cases for this is pooling auth to automatically rotate between users whenever one user gets rate-limited. Implementing this within the Scraper class will require a more significant refactor of the scraper and so is not in scope at this time. As a workaround, library consumers can leverage ErrorRateLimitStrategy to throw immediately when being rate-limited, and use that to pool Scraper instances themselves (pseudocode):

async function createScraper(args) {
  const scraper = new Scraper({ rateLimitStrategy: new ErrorRateLimitStrategy() });
  await scraper.login(...args);
}

const scrapers = new Map<UserId, Scraper>([
  ['user-1', await createScraper(authInfo1)],
  ['user-2', await createScraper(authInfo2)]
]);

// later...
function getTweets(args) {
  for (const [userId, scraper] of scrapers.entries()) {
    try {
      return scraper.getTweets(...args);
    } catch (err) {
      console.warn(`Scraper "${userId}" is currently rate-limited`)
    }
  }

  throw new Error('All scrapers are currently rate-limited.');
}

The above pseudocode is not the only (or the cleanest) way this can be done, but (for now) the specifics are left as an exercise for library consumers.

Given that the code has a strong assumption on only being authenticated with one user at a time, that may well be how this library implements pooling in the future, to avoid the aforementioned refactor; maybe a PooledScraper with the same public interface that internally manages multiple Scraper instances.


Resolves #81, closes #115, partially addresses #87

Implements support for custom rate-limiting strategies, allowing for changing the
default library behavior of waiting until the current period has expired. This allows
for significantly more flexibility in how rate-limiting is handled.

This also adds a new implementation of this strategy that simply throws an error
when being rate-limited.

One of the primary desired use cases for this is pooling auth to automatically
rotate between users whenever one user gets rate-limited. This will require a
more significant refactor of the scraper and so is not included at this time. As a
workaround, library consumers can leverage `ErrorRateLimitStrategy` to throw
immediately when being rate-limited, and use that to pool `Scraper` instances
themselves. Given that the code has a strong assumption on only being
authenticated with one user at a time, that may well be how this library
implements pooling in the future (maybe a `PooledScraper` with the same
public interface that internally manages multiple `Scraper` instances).
@karashiiro
Copy link
Collaborator Author

Ran tests locally: search and tweets test suites still fail due to #114, no regressions elsewhere.

@karashiiro karashiiro merged commit df49ca4 into the-convocation:main Jan 19, 2025
0 of 2 checks passed
@karashiiro karashiiro deleted the feat/configurable-rate-limiter branch January 19, 2025 02:41
@pkdev08
Copy link

pkdev08 commented Jan 19, 2025

Hi @karashiiro for some reason when I do

export async function getTweet(client, id) {
  for (const [userId, scraper] of client.scrapers.entries()) {
    try {
      return await scraper.getTweet(id);
    } catch (err) {
      logger.warn(`Scraper "${userId}" had an error:`, err.message);
    }
  }

  logger.error('All scrapers currently have errors.');
}

the err.message is undefined and also the err is a pending promise although I am awaiting getTweet()

@karashiiro
Copy link
Collaborator Author

Looks like an oversight in how the error was constructed and missed due to a bad test, will have a fix out shortly.

@karashiiro
Copy link
Collaborator Author

Fixed in v0.15.1 @pkdev08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop getting tweets after some time, without timeout Let the user choose how to handle rate limits

2 participants