Conversation
| me <- lift CL.head | ||
| case me of | ||
| Just (EventSequenceStart _ _ a) -> parseArrayStreaming 0 a | ||
| _ -> liftIO $ throwIO $ UnexpectedEvent me $ Just $ EventSequenceStart NoTag AnySequence Nothing |
There was a problem hiding this comment.
I added some values here for the "expected" event, hoping that would be more helpful that using Nothing.
| -- In combination with 'Y.decodeFile', it can be used to consume | ||
| -- a yaml file that consists in a very large list of values. | ||
| parseStream :: ReaderT JSONPath (ConduitM Event Value Parse) () | ||
| parseStream = do |
There was a problem hiding this comment.
I can change the name of this if there's a better name for it.
|
|
||
| name: yaml | ||
| version: 0.11.7.0 | ||
| version: 0.11.8.0 |
There was a problem hiding this comment.
I didn't change this myself. I don't know why it kept happening.
|
I don't think this would make a good addition here at this time. Such an approach would make more sense as a separate library to allow the API design to evolve instead of trying to promote it to a stable package immediately. One thought is that it seems to me like the |
OK. It might be a good idea, but I don't have the time right now to create a separate project. Perhaps another time. For now, I will just use my fork and merge any updates from your repository. The reason I added it here instead of on my own code is that it uses internal functions that are not exported, and I also thought it might be useful to other people. I simply copied Thanks for the prompt reply! |
I added the ability to consume yaml files incrementally using conduits. This is exposed by the
parseStreamfunction in the moduleData.Yaml.Internal.The motivation was simply wanting to consume large files without having to have them fully in memory. I don't know what the demand for something like this is, but I wanted it.
I was thinking maybe to define a wrapper in
Data.Yamlthat would be easier to use and discover. However, I wasn't sure what that would look like. In any case, exporting the conduit works well enough for myself.I also had to export
ParseStateor otherwise I wasn't able to run the parser.To try it out, I wrote the following code:
{-# LANGUAGE ImportQualifiedPost, OverloadedStrings #-} module Main (main) where import Data.Void (Void) import Data.Aeson (FromJSON, parseJSON, withObject, (.:), fromJSON) import Data.Aeson qualified as JSON import Conduit (ConduitM, runConduit, (.|), runResourceT) import Data.Conduit.List qualified as CL import Data.Yaml.Internal (Parse, ParseState (..), parseStream) import Control.Monad.Reader (runReaderT) import Control.Monad.State.Strict (evalStateT) import Text.Libyaml (decodeFile) data ID = ID { id_field :: Int } instance FromJSON ID where parseJSON = withObject "ID" $ \o -> fmap ID $ o .: "id" main :: IO () main = do let f :: Int -> JSON.Value -> Int f i v = case fromJSON v of JSON.Error err -> error err JSON.Success x -> i + id_field x let conduit :: ConduitM () Void Parse Int conduit = decodeFile "big.yaml" .| runReaderT parseStream [] .| CL.fold f 0 r <- runResourceT $ evalStateT (runConduit conduit) $ ParseState mempty [] print rAnd then I created a 1,5Gb file (
big.yaml). The memory still goes up as the program runs, but does so very slowly and the program finishes before it is a problem (it ends up taking close to 3Gb though). Trying to parse the entire file quickly used all of my available memory (31,2Gb). I was expecting the conduit version to take a roughly constant amount of memory. I wonder what causes the slow build up. Could it be related to the parser state?Thanks in advance for your review.