Skip to content
This repository was archived by the owner on Mar 17, 2020. It is now read-only.

API scraper#3

Open
siennathesane wants to merge 5 commits intomasterfrom
docs-scraper
Open

API scraper#3
siennathesane wants to merge 5 commits intomasterfrom
docs-scraper

Conversation

@siennathesane
Copy link
Owner

@siennathesane siennathesane commented Feb 21, 2019

adding the docs scraper to go through and express all the Windows APIs. it's highly concurrent, but there is a lot of room for optimisation improvements. currently I've only tested it for Windows Desktop, but once I flesh out the last big of bugs, it should be good to go.

@aaronmsft @bitcrazed @erikstmartin this should really help grease the wheels on some of our conversations.

Signed-off-by: Mike Lloyd mike@reboot3times.org

…s. it's highly concurrent, but there is a lot of room for optimisation improvements.

Signed-off-by: Mike Lloyd <mike@reboot3times.org>
@todo
Copy link

todo bot commented Feb 21, 2019

(mxplusb): this should stream more efficiently.

https://github.com/mxplusb/windows/blob/9f11f25c4629389c7623d333e1c6f860ff8e6aeb/docs-scraper/main.go#L73-L78


This comment was generated by todo based on a TODO comment in 9f11f25 in #3. cc @mxplusb.

Signed-off-by: Mike Lloyd <mike@reboot3times.org>
@todo
Copy link

todo bot commented Feb 21, 2019

(mxplusb): figure out why this only works sometimes and not others. it seems to be specific to the DX libraries for some reason.

https://github.com/mxplusb/windows/blob/85a57a03ebc35ed86fe98b3fa63bb66eba96b888/docs-scraper/main.go#L220-L225


This comment was generated by todo based on a TODO comment in 85a57a0 in #3. cc @mxplusb.

@siennathesane siennathesane marked this pull request as ready for review February 21, 2019 08:56
@siennathesane siennathesane added this to the API Template Generation milestone Feb 21, 2019
Signed-off-by: Mike Lloyd <mike@reboot3times.org>
Signed-off-by: Mike Lloyd <mike@reboot3times.org>
added nil pointer checks due to dereference panics.
added version regular expressions for later.
added global counter for humanity's sake.
split out version table buffer from code block buffer.
removed remarks regular expression due to garbage data.

there's still an issue with deadlocks/race condition somewhere, but it's
really inconsistent. sometimes it stalls at like ~3k functions found,
other times it stalls at like ~17k functions found. I don't know where
the problem is. I'm wondering if it's on the Microsoft side, due to all
of the crawling traffic. I wonder if it might be a DoS concern.

Signed-off-by: Mike Lloyd <mike@reboot3times.org>
@siennathesane
Copy link
Owner Author

Welp, I got blacklisted from the docs website, haha.

image

@siennathesane
Copy link
Owner Author

I need to add a rate limiter.

@siennathesane siennathesane self-assigned this Mar 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant