Skip to content

Addition of mode "fastload", as well as support for 2-byte and 4-byte precision#13

Merged
leonbohmann merged 7 commits intoleonbohmann:devfrom
hakonbar:master
Apr 27, 2022
Merged

Addition of mode "fastload", as well as support for 2-byte and 4-byte precision#13
leonbohmann merged 7 commits intoleonbohmann:devfrom
hakonbar:master

Conversation

@hakonbar
Copy link
Copy Markdown
Contributor

Kind of a big pull request, introducing two features at once. They've been implemented in such a way as to not interfere with existing functionality.

  • Added a 'fastload' mode, which takes advantage of the fact that consecutive data points in a measurement channel are stored as a contiguous "byte chunk" in the catman binary format instead of blockwise. You therefore only need to pass a pointer to the first byte as well as the length of the chunk.
  • Added the method "Channel.readExtHeader", in order to get at the attribute "ExportFormat". This attribute indicates the byte depth or precision of the measurement file, allowing the algorithm to differentiate.
  • Added the method "BinaryReader.read_float", which reads in 4-byte floating point numbers.
  • Changed the name of the method "read_single" to "read_byte" to avoid confusion with the newly added method.
  • Added some sample data from HBK with 2-, 4- and 8-byte data.

# Added a 'fastload' option, which reads the channel data to a numpy array.

# Added support for reading measurement data with 2-byte and 4-byte precision
   - added the method 'read_float()'
   - renamed the method 'read_single()' to 'read_byte()' to avoid confusion.

# Added test files with data at the different precision levels.
The code now sets the attribute "ExportFormat" to zero instead of throwing away the entire extended header.
Copy link
Copy Markdown
Owner

@leonbohmann leonbohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, "fastload" will instruct the Channel to load its data using Numpys Implementation of np.fromfile?

Do you think it would be a good idea to make fastload the default way of loading data? I mean if it does the same thing only faster... If so, you could change the default value for it.

Copy link
Copy Markdown
Owner

@leonbohmann leonbohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so all in all this seems fine. Only thing I don't get yet is the scale factor..

@leonbohmann leonbohmann added question Further information is requested enhancement New feature or request and removed question Further information is requested labels Apr 26, 2022
@leonbohmann
Copy link
Copy Markdown
Owner

I am merging this into a new branch "dev" which I will use to develop the future release. Just to keep things organized!

@leonbohmann leonbohmann changed the base branch from master to dev April 27, 2022 19:03
@leonbohmann leonbohmann merged commit f37cc49 into leonbohmann:dev Apr 27, 2022
@leonbohmann
Copy link
Copy Markdown
Owner

The fastload option actually breaks the conversion into json format, because when using fastload it created the data objects of the channels as ndarray which is not marked as json-serializable.

@hakonbar
Copy link
Copy Markdown
Contributor Author

Hi Leon, you could use a condition like "type(data) is ndarray" or "isinstance(data, ndarray)" to check if the data is stored in a numpy array, and if True, apply a statement like "data_to_write = data.tolist()" or "data_to_write = list(data)" before writing to json or csv.

For numerical data like we have here, however, it's usually better to use a binary format when writing to disk, as this requires a lot less storage space and makes for faster reading and writing of files. One suggestion here would be to implement a save method which uses the "pickle" module to pickle the whole object. This could be useful for users who want to use Python for further processing. One could also implement a method which writes the measurement data to parquet and the metadata to json. This would make the output more portable.

When working with homogeneous numerical data (where all values in the data structure are of the same datatype), numpy arrays are usually orders of magnitude faster than lists. The numpy and scipy libraries have also implemented heaps of functions which are optimized for just this data structure. You could therefore consider converting the measurement data to an ndarray also when not using fastload. I think your filtering function "lfilt()" might also return an ndarray, but I'm not sure.

@leonbohmann
Copy link
Copy Markdown
Owner

Yes you are right about that. Maybe I'll just overhaul the data datatype completely and switch to ndarray. Then it'll be upon the user to decide on how to save it.

This package should focus on reading the data only, most users will probably create theirnown plots and files eitherway...

@hakonbar
Copy link
Copy Markdown
Contributor Author

hakonbar commented May 20, 2022

In that case, the pickle module would be a perfect fit. It allows you to dump an item in your workspace to file with only a few lines of code. The file can then just as easily be loaded into the workspace again in a later Python session for further processing. See a code example below (excuse my Norwegian code):

`def lag_pickle(mappe_lagre,objekt,filnavn):

  if not filnavn.endswith('.pkl'):
      from pathlib import Path
      filnavn = Path(filnavn).stem + '.pkl'

  with open(os.path.join(mappe_lagre,filnavn), 'wb') as outp:
      pickle.dump(objekt, outp, pickle.HIGHEST_PROTOCOL)

def hent_pickle(mappe_last,filnavn):

  with open(os.path.join(mappe_last,filnavn), 'rb') as inp:
      objekt = pickle.load(inp)
      
  return objekt`

@hakonbar
Copy link
Copy Markdown
Contributor Author

By the way, I've found a bug with the fastload mode which occurs when the file has fewer datapoints than is indicated in the header. The regular mode raises an IndexError there, but fastload doesn't, and produces gibberish instead. I'll try and fix it.

@leonbohmann
Copy link
Copy Markdown
Owner

By the way, I've found a bug with the fastload mode which occurs when the file has fewer datapoints than is indicated in the header. The regular mode raises an IndexError there, but fastload doesn't, and produces gibberish instead. I'll try and fix it.

That will be a problem also for the reading using the original method. Therefor we should consider some error handljng to prevent the code failing using both methods..

@leonbohmann
Copy link
Copy Markdown
Owner

leonbohmann commented May 20, 2022

In that case, the pickle module would be a perfect fit....

Yes true. But I think this package should then only be used to convert the binary data to some ndarray in python. The seconds step will be up to the user.

While using the package myself, I realised that I tend to make a lot of changes in the package just so it fits my needs. It'll be more efficient, if we keep things and responsibilites simple, I think!

@leonbohmann
Copy link
Copy Markdown
Owner

New version is released containing your changes. I think the external header data is really helpful as well so I included that into the readme!

@LarissaPestana
Copy link
Copy Markdown

hello leon, is it possible to convert the read file into .xlxs?

@leonbohmann
Copy link
Copy Markdown
Owner

For questions and feature request please create a new issue.

Surely it is possible, but unfortunately that functionality is not part of this package. I did a quick search and found out, that you can convert a pandas dataframe to excel. For that, you would have to convert the channels to a dataframe first.

The other option would be to save the data as a csv file, you can simply open that with excel directly and save it as xlsx from there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants